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(54) Title: GENOME WALKING BY SELECHVE AMPLIFlCAnON OF NICK-TRANSLATE DNA LIBRARY AND AMPLI- 
FICATION FROM COMPLEX MIXTURES OF PEMPLATES 

(57) Abstract: Improved nicihods ;md rcagcnls Ibr chromosome walking of nucleic acid are discussed herein. A library of ampli- 
fiable nick translation molecules is generated, and a chromosome walk is initiated from a known sequence in the nucleic acid by 
producing at least one nick translate molecule, sequencing part of the nick translate molecule, and producing a second nick translate 
molecule by initialing the primer extension from the legion of the oblairjcd sequence of the prior nick translate molecule. 
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GENOME WALiaNG BY SELECTIVE AMPLIFICATION OF NICK-TRANSLATE 
DNA LIBRARY AND AMPLIFICATION FROM COMPLEX 
MIXTURES OF TEMPLATES 

[0001] This application claims priority to U.S. Provisional Patent Application 
Serial No. 60/288,205, filed May 2, 2001. 

FIELD OF THE INVENTION 

[0002] The present invention relates generally to the fields of molecular biology 
and genomes. Particularly, it concerns utilization of DNA libraries for amplifying and 
analyzing DNA. More particularly, it concerns utilizing DNA libraries of nick translated 
products for chromosome walking. 

DESCRIPTION OF RELATED ART 

Au DNA preparation using in vivo and in vitro amplification and multiplexed 
versious thereof 

[0003] Because the amount of any specific DNA molecule that can be isolated 
firom even a large number of cells is usually very small, the only practical methods to prepare 
enough DNA molecules for most applications involve amplification of specific DNA 
molecules in vivo or in vitro. There are basically six genial methods important for 
manipulating DNA for analysis: 1) in vivo clonmg of unique fragments of DNA, 2) in vitro 
amplification of unique fragments of DNA, 3) in vivo cloning of random libraries (mixtures) 
of DNA fragments, 4) in vitro preparation of random libraries of DNA fragments, 5) in vivo 
cloning of ordered libraries of DNA, 6) in vitro preparation of ordered libraries of DNA. The 
beneficial effect of ampUfying mixtures of DNA is tliat it facilitates analysis of large pieces 
of DNA (e.g., chromosomes) by creating libraries of molecule that are small enough to be 
analyzed by existing techniques. For example the largest molecule that can be subjected to 
DNA sequencing methods is less than 2000 bases long, which is many orders of magnitude 
shorter than single chromosomes of organisms. Although short molecules can be analyzed, 
considerable effort is required to assemble the infonnation from the analysis of the short 
molecules into a description of the larger piece of DNA. 

1. In vivo cloning of unique DNA 
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[0004] Unique-sequence source DNA molecules can be amplified by separating 
them firom other molecules {e.g., by electrophoresis), ligating them into an autonomously 
replicating genetic element {e.g., a bacterial plasmid), transfecting a host cell wifli the 
recombinant genetic element, and growing a clone of a single transfected host cell to product 
many copies of the genetic element having the insert with the same unique sequence as the 
source DNA (Sambrook, et aL, 1989). 

2. /// vitro amplification of unique DNA 

[0005] There are many methods designed to amplify DNA in vitro. Usually tiiese 
methods are used to prepare unique DNA molecules from a complex mixture, e.g., genomic 
DNA or a artificial chromosome. Alternatively a restricted set of molecules can be prepared 
as a library that represents a subset of sequences in the complex mixture. These amplification 
methods include PGR, rolling circle amplification, and strand displacement. (Walker, et al. 
1996a; Walker, et al 1996b; U.S. Patent No. 5,648,213; U.S. Patent No. 6,124,120). 

[0006] The polymerase chain reaction (PGR) can be used to amplify specific 
regions of DNA between two known sequences (U.S. Patent No. 4,683,195, U.S. Patent No. 
4,683,202; Frohmau et al, 1995). PGR involves the repetition of a cycle consisting of 
denaturation of the source (template) DNA, hybridization of two oligonucleotide primers to 
know sequences flankmg the region to the amplified, primer extension usmg a DNA 
polymerase to synthesize strands complementary to the DNA region located between the two 
primer sites. Because the products of one cycle of amplification serve as source DNA for 
succeeding cycles, the amphfication is exponential. PGR can synthesize large nmnbers of 
specific molecules quickly and inexpensively. 

[0007] The major disadvantages of the PGR method to amplify DNA are that 1) 
information about two flanlcing sequences must be known in order to specify the sequences of 
the primers, 2) synthesis of primers is expensive, 3) the level of amphfication achieved 
depends strongly on the primer sequences, source DNA sequence, and the molecular weight 
of the amplified DNA and 4) the length of amplified DNA is usually limited to less than 5 kb, 
although "long-distance" PGR (Gheng, 1994) allows molecules as long as 20 kb to be 
amplified. 

[0008] "One-sided PGR" techniques are able to amphfy unknown DNA adjacent 
to one known sequence. These techniques can be divided into 3 categories: a) ligation- 
mediated PGR, facilitated by addition of a universal adaptor sequence to a terminus usually 
created by digestion witli a restriction endonuclease; b) universal primer-mediated PGR, 
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facilitated by a primer extension reaction initiated at arbitrary sites c) terminal transferase- 
mediated PCR, facilitated by addition of a homonucleotide "tail" to the 3' end of DNA 
fragments; and d) "inverse PCR, facilitated by circularization of the template molecules. 
These techniques can be used to amplify successive regions along a large DNA template in a 
process sometimes called "chromosome walking." 

[0009] Ligation-mediated PCR is practiced in many forms. Rosenthal et aL 
(1990) outlined the basic process of amplifying an unloiown region of DNA immediately 
adjacent to a Imown sequence located near the end of a restriction fragment. Reiley et aL 
(1990) used primers that were not exactly complementary with the adaptors in order to 
suppress amplification of molecules that did not have a specific priming site. Jones (1993) 
and Siebert (1995; U.S. Patent No. 5,565,340) used long universal primers that formed 
intrastrand "panhandle" structures tliat suppressed PCR. of molecules having two universal 
adaptors, Amold (1994) used "vectorette" primers having unpaired central regions to 
increase the specificity of one-sided PCR. Macrae and Brermer (1994) ampUfied short inserts 
from a Fugu genomic clone library using nested primers from a specific sequence and from 
vector sequences. Lin et aL (1995) Ugated an adaptor to restriction fragment ends that had an 
overhanging 5' end and employed hot-start PCR with a single universal anchor primer and 
nested specific-site primers to specifically amplify human sequences. Liao et aL (1997) used 
two specific site primers and 2 imiversal adaptors, one of which had a blocked 3' end to 
reduce non-specific background, to amplify zebrafish promoters. Devon et aL (1995) used 
"splinkerette-vectorette" adaptors with special secondary structure in order to decrease non- 
specific amphfication of molecules with two universal sequences during ligation-mediated 
PCR. Padegimas and Reichert (1998) used phosphorothioate-blocked oUgonucleotides and 
exo EH digestion to remove the unUgated and partially hgated molecules from the reactions 
before performing PCR, in order to increase the specificity of amphfication of maize 
sequences. Zhang and Gurr (2000) used hgation-mediated hot-start PCR of restriction 
fragments using nested primers in order to amplify up to 6 kb of a fimgal genome. The large 
amplicons were subsequently directly sequenced using primer extension. 

[0010] To increase the specificity of ligation-mediated PCR products, many 
methods have been used to "index" the amplification process by selection for specific 
sequences adjacent to one or both termhii (e.g., Smith, 1992; Unrau, 1994; Guilfoyle, 1997; 
U.S. Patent No. 5,508,169). 

[0011] One-sided PCR can also be achieved by direct amplification using a 
combination of unique and non-unique primers. Harrison et aL (1997) performed one-sided 
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PGR using a degenerate oligonucleotide primer tliat was complementaiy to an unknown 
sequence and three nested primers complementary to a known sequence in order to sequence 
transgenes in mouse cells. US5994058 specifies using a unique PGR primer and a second, 
partially degenerate PGR primer to achieve one-sided PGR. Weber et al (1998) used direct 
PGR of genomic DNA with nested primers from a known sequence and 1-4 primers 
complementaiy to frequent restriction sites. This technique does not require restriction 
digestion and ligation of adaptors to the ends of restriction fragments, 

[0012] Terminal transferase can also be used in one-sided PGR. Cormack and 
Somssich (1997) were able to amplify the termini of genomic DNA fragments using a 
method called RAGE (rapid ampUfication of genome ends) by a) restricting the genome with 
one or more restriction enzymes, b) denaturing the restricted DNA, c) providing a 3' 
polythymidine tail using tenninal transferase, and d) performing two rounds of PGR using 
nested primers complementary to a known sequence as well as the adaptor. Rudi et al 
(1999) used terminal transferase to achieve chromosome walking in bacteria using a method 
of one-sided PGR tliat is independent of restriction digestion by a) denaturation of the 
template DNA, b) linear amplification using a primer complementary to a known sequence, 
c) addition of a poly G "tail" to the 3' end of the single-stranded products of linear 
amplification using a reaction catalyzed by tenninal transferase, and d) PGR amplification of 
the products using a second primer within the known sequence and a poly-G primer 
complementaiy to the poly-G tail in the unknown region. The products amplified by Rudi 
(1999) have a very broad size distribution, probably caused by a broad distribution of lengths 
of the linearly- amplified DNA molecules. 

[0013] RNA polymerase can also be used to achieve one-sided amplification of 
DNA. U.S. Patent No. 6,027,913 shows how one-sided PGR can be combined with 
transcription with RNA polymerase to ampUfy and sequence regions of DNA with only one 
known sequence. 

[0014] Inverse PGR (Ochman et al, 1988) is anotlier method to ampUfy DNA 
based on knowledge of a single DNA sequence. The template for inverse PGR is a circular 
molecule of DNA created by a complete restriction digestion, which contains a small region 
of laiown sequence as well as adjacent regions of unlcnown sequence. The oligonucleotide 
primers are oriented such that during PGR tliey give rise to primer extension products that 
extend way fi-om the known sequence. This "inside-out" PGR results in linear DNA products 
with known sequences at the termini. 
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[0015] The disadvantages of all "one-sided PGR" methods is that a) the length of 
the products are restricted by the limitation of PGR (normally about 2 kb, but with special 
reagents up to 50 kb); b) whenever the products are single DNA molecules longer than 1 kb 
they are too long to directly sequence; c) in ligati on-mediated PGR the ampUcon lengths are 
very unpredictable due to random distances between the universal priming site and the 
specific priming site(s), resulting in some products that are sometimes too short to walk 
significant distance, some wliich ai'e preferentially amplified due to small size, and some that 
are too long to amplify and analyze, and d) in methods that use terminal transferase to add a 
polynucleotide tail to the end of a primer extension product, there is great heterogeneity in the 
length of the amplicons due to sequence-dependent differences in the rate of primer 
extension, 

[0016] Strand displacement ampUfication (Walker, et al. 1996a; Walker, et aL 
1996b; U.S. Patent No. 5,648,213; U.S. Patent No. 6,124,120) is a method to amplify one of 
more termini of DNA fragments using an isothermal strand displacement reaction. The 
method is initiated at a nick near the terminus of a double-stranded DNA molecule, usually 
generated by a restriction enzyme, followed by a polymerization reaction by a DNA 
polymerase that is able to displace the strand complementary to the template strand. Linear 
amplification of the complementary strand is achieved by reusing the template multiple times 
by nicking each product strand as it is synthesized. The products are strands with 5' ends at a 
unique site and 3' ends that are various distances from the 5' ends. The extent of the strand 
displacement reaction is not controlled and therefore the lengths of the product strands are not 
uniform. The polymerase used for sti'and displacement ampUfication does not. have a 5' 
exonuclease activity. 

[0017] Rolling circle amplification (U.S. Patent No. 5,648,245) is a method to 
increase the effectiveness of the strand displacement reaction by using a circular template. 
The polymerase, which does not have a 5' exonclease activity, malces multiple copies of the 
information on the circular template as it makes multiple continuous cycles around the 
template. The length of the product is very large-typically too large to be directly 
sequenced. Additional amplification is acliieved if a second strand displacement primer is 
added to the reaction to used the fii*st strand displacement product as a template. 
3. In vivo cloning of DNA of random libraries 

[0018] Libraries are collections of small DNA molecules that represent all parts of 
a larger DNA molecule or collection of DNA molecules (Primrose, 1998; Cantor and Smith, 
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1999). Libraries can be used for analytical and preparative purposes. Genomic clone 
libraries are the collection of bacterial clones containing fragments of genomic DNA. cDNA 
clone libraries are collections of clones derived from the mRNA molecules in a tissue. 

[0019] Cloning of non-specific DNA is commonly used to separate and amplify 
DNA for analysis. DNA from an entire genome, one chromosome, a virus, or a bacterial 
plasmid is fragmented by a suitable method {e.g., hydrodynamic shearing or digestion with 
restriction enzymes), ligated into a special region of a bacterial plasmid or other cloning 
vector, transfected into competent cells, amplified as a part of a plasmid or chromosome 
during proliferation of the cells, and harvested from the cell culture. Critical to the specificity 
of this technique is the fact that the mixture of cells carrying different DNA inserts can be 
diluted and aliqiioted such that some of the aliquots, whether on a surface or in a volimie of 
solution, contain a single transfected cell containing a unique fragment of DNA. 
Proliferation of this single cell (in vivo clomng) amplifies this unique fi-agment of DNA so 
that it can be analyzed. This "shotgun" cloning method is used very firequently, becaxise: 1) it 
is inexpensive, 2) it produces very pure sequences that are usually faithful copies of the 
source DNA, 3) it can be used in conjunction with clone screening techniques to create an 
imlimited amount of specific-sequence DNA, 4) it allows sunultaneous amplification of many 
different sequences, 5) it can be used to amplify DNA as large as 1,000,000 bp long, and 6) 
the cloned DNA can be directly used for sequencing and other purposes. 
' a. Multiplex cloning 

[0020] Cloning is » inexpensive, because many pieces of DNA can be 
simultaneously transfected into host cells. The general term for this process of mixing a 
number of different entities (e.g,, electronic signals or molecules) is "multiplexing," and is a 
common strategy for increasing the number of signals or molecules that can be processed 
simultaneously and subsequently separated to recover the mformation about the individual 
signals or molecules. In the case of conventional cloning the recovery process involves 
diluting the bacterial culture such that an aliquot contains a single bacterium carrying a single 
plasmid, allowing the bacterimn to multiply to create many copies of the original plasmid, 
and isolating the cloned DNA for further analysis. 

[0021] The principle of multiplexing different molecules in the same transfection 
experiment is critical to the economy of the cloiiuig method. However, after the transfection 
each clone must be grown separately and the DNA isolated separately for analysis. These 
steps, especially the DNA isolation step, are costly and tune consuming. Several attempts 
have been made to multiplex steps after cloning, whereby hundreds of clones can be 
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combined during the steps of DNA isolation and analysis and the characteristics of the 
individual DNA molecules recovered later. In one version of multiplex cloning the DNA 
jfragments are separated into a number of pools ie,g., one hundred pools). Each pool is 
ligated into a different vector, possessing a nucleic acid tag with a unique sequence, and 
transfected into the bacteria. One clone fi*om each transfection pool is combined with one 
clone from each of the other transfection pools in order to create a mixture of bacteria having 
a mixture of inserted sequences, where each specific inserted sequence is tagged with a 
unique vector sequence, and therefore can be identified by hybridization to the nucleic acid 
tag. This mixture of cloned DNA molecules can be subsequently separated and subjected to 
any enzymatic, chemical, or physical processes for analysis such as treatment with 
polymerase or size separation by electrophoresis. The information about individual 
molecules can be recovered by detection of the nucleic acid tag sequences by hybridization, 
PGR amplification, or DNA sequencing. Church has shown methods and compositions to 
use multiplex cloning to sequence DNA molecules by pooling clones tagged with different 
labels during the steps of DNA isolation, sequencing reactions, and electrophoretic separation 
of denatured DNA strands (U.S. Patent Nos. 4,942,124; 5,149,625). The tags are added to 
the DNA as parts of the vector DNA sequences. The tags used can be detected using 
oligonucleotides labeled with radioactivity, fluorescent groups, or volatile mass labels 
(Cantor and Smith, 1999; U.S. Patent Nos. 4,942,124; 5,149,625; 5,112,736; Richterich and 
Church, 1993). U.S. Patent No. 5,714,318 is directed to a technique whereby the tag 
sequences are ligated to the DNA fragments before cloning using a universal vector. 
Furthermore, PCT WO 98/15644 specifies a method whereby tlie tag sequences added before 
transfection are ampUfied using PGR after electrophoretic separation of the denatured DNA. 
b. Disadvantages 
[0022] The disadvantage of preparing DNA by amphfying random fragments of 
DNA is that considerable effort is necessary to assemble the information within the short 
fragments into a description of the original, source DNA molecule. Nevertheless, amplified 
short DNA fragments are commonly used for many applications, including sequencing by the 
technique called "shotgun sequencing.'' Shotgun sequencmg involves sequencing one or both 
ends of small DNA fragments that have been cloned from randomly-fragmented large pieces 
of DNA. During the sequencing of many such random fragments of DNA, overlapping 
sequences are identified fi"om those clones that by chance contain redundant sequence 
information. As more and more fragments are sequenced more overlaps can be found from 
contiguous regions (contigs). As more and more fragments are sequenced the regions that are 
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not represented become smaller and less frequent. However, even after sequencing enough 
fragments that the average region has been sequenced 5-10 times, there will still be gaps 
between contigs due to statistical sampling effects and to systematic under-representation of 
some sequences during cloning or PGR amplification (ref). Thus the disadvantage of 
sequencing random fragments of DNA is that 1) a 5 ~ 10 fold excess of DNA must be 
isolated, subjected to sequencing reactions, and analyzed before havmg large contiguous 
sequenced regions, and 2) there are still numerous gaps in the sequence that must be filled by 
expensive and time-consuming steps. 

4. /// vitro preparation of DNA as random libraries 

[0023] DNA hbraries can be formed in vitro and subjected to various selection 
steps to recover information about specific sequences. In viti'o libraries are rarely used in 
genomics, because the methods that exist for creating such Hbraries do not offer advantages 
over cloned libraries. In particular the methods used to amplify the in vitro hbraries are not 
able to amplify all of the DNA in an unbiased manner, because of the size and sequence 
dependence of amplification efficiency. WO 00/18960 describes how different methods- of 
DNA amplification can be used to create a library of DNA molecules representing a specific 
subset of the sequences witliin the genome for purposes of detecting genetic polymorphisms. 
'Hahdom-prime PGR" (U.S. Patent No. 5,043,272; U.S. Patent No. 5,487,985) "random- 
prime strand displacement" (U.S. Patent No. 6,124,120) and "AFLP" (U.S. Patent No. 
6,045,994) are three examples of methods to create hbraries that represent subsets of complex 
mixtures of DNA molecules. 

[0024] Single-molecule PGR can be used to amplify individual randomly- 
fragmented DNA molecules (Lulcyanov et aL, 1996). In one method, the source DNA is first 
firagmented into molecules usually less than 10,000 bp in size, ligated to adaptor 
ohgonucleotides, and extensively diluted and aliquoted into separate firactions sucK that the 
fractions often contain only a single molecule. PGR amphfication of a firaction containing a 
shigle molecule creates a very large number of molecules identical to one of the original 
fragments. If the molecules are randomly firagmented, the amphfied firactions represent DNA 
from random positions within the source DNA. 

[0025] WO 00/1 5779 A2 describes how a specific sequence can be amplified from 
a library of circular molecules with random genomic inserts using rollmg circle amplification. 

5. In vivo cloning of ordered libraries of DNA 
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[0026] Directed cloning is a procedure to clone DNA from different parts of a 
larger piece of DNA, usually for the purpose of sequencing DNA from different positions 
along the source DNA. Methods to clone DNA with "nested deletions" have been used to 
make "ordered libraries" of clones that have DNA starting at different regions along a long 
piece of source DNA. In one version, one end of the source DNA is digested with one or 
more exonuclease activities to delete part of the sequence (McCombie et aL, 1991; U.S. 
Patent No. 4,843,003). By controlling the extent of exonuclease digestion, the average 
amount of the deletion can be controlled. The DNA molecules are subsequently separated 
based on size and cloned. By cloning molecules with different molecular weights, many 
copies of identical DNA plasmids are produced that have inserts ending at controlled 
positions within the source DNA. Transposon insertion (Berg et aL^ 1994) is also used to 
clone different regions of source DNA by facilitating priming or cleavage at random 
positions in the plasmids. The size separation and recloning steps make botih of these 
methods labor intensive and slow. They are generally limited to covering regions less than 
10 kb in size and cannot be used directly on genomic DNA but rather cloned DNA 
molecules. 

6. In vitro preparation of ordered libraries DNA 

[0027] Ordered libraries have not been frequently created in vitro, Hagiwara 

(1996) used vectorette adaptors and exonuclease digestions to create a nested set of one-sided 
PGR products that could be used to walking across a cosmid after size separation. No 
methods are known to create ordered libraries of DNA molecules directly from genomic 
DNA. 

B. DNA physical mapping to create ordered clones 

[0028] Tliere is often a need to organize a library of randomly cloned DNA 
molecules into an ordered libraiy where the clones are arranged according to position in the 
genome (Primrose, 1998; Cantor and Smith, 1999). Some of the purposes for creating an 
ordered library ai*e 1) to compare overlappmg clones to detect defects deletions) in 
some of the clones, 2) to decide wliich clones should be used to determine the underlying 
DNA sequence witli the least redundancy in sequencing effort, 3) to localize genetic features 
within the genome, 4) to access different regions of the genome on the basis of their 
relationship to the genetic map or proximity to another region, and 5) to compare the 
structure of the genomes of different individuals and different species. There are four basic 
methods for creating ordered libraries of clones: 1) hybridization to determine sequence 
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homology among different clones, 2) fluorescent in situ hybridization (FISH), 3) restriction 
analysis, and 4) STS mapping. 

1. Mapping by hybridization 

[0029] The first method usually involves hybridization of one clone or other 
identifiable sequence to all otlier clones in a library. Those clones that hybridize contain 
overlapping sequences. Tliis method is usefiil for locating clones that overlap a common site 
{e.g,, a specific gene) in the genome, but is too laborious to create an ordered library of an 
entire genome. In addition many organisms have large amounts of repetitive DNA that can 
give false indications of overlap between two regions. The resolution of the hybridization 
techniques is only as good as the distance between known sequences of DNA. 

2. Mapping by FISH 

[0030] The FISH method allows a particular sequence or limited set of sequences 
to be localized along a chromosome by hybridization of a fluorescently-labeled probe with a 
spread of intact cliromosomes, followed by light-microscopic localization of the fluorescence. 
This teclinique is also only of use to locate a specific sequence or small number of sequences, 
rather than to create a physical map of the entire genome or an ordered library representing 
the entire genome. The resolution of the light microscope limits the resolution of FISH to 
about 1,000,000 bp. To map a single-copy sequence, the FISH probe usually needs to be 
about 10,000 long. 

3. Mapping by restriction digestion 

[0031] Mapping by restriction digestion is firequently used to determine overlaps 
between clones, thereby allowing ordered libraries of clones to be constructed. It involves 
assembly of a number of large clones into a contiguous region (contig) by analyzing the 
overlaps in the restriction patterns of related clones. This method is insensitive to the 
presence of repetitive DNA. The products of a complete or partial restriction digestion of 
every clone are size separated by electrophoresis and the molecular weights of the fragments 
analyzed by computer to find con*elated sequences in different clones. The mformation firom 
the restriction patterns produced by five or more restriction enzymes is usually adequate to 
determine not only which clones overlap, but also the extent of overlap and whether some of 
the clones have deletions, additions, rearrangements, etc. Physical mapping of restriction 
sites is a very tedious process, because of the very large nmnbers of clones that have to be 
evaluated. For example, > 300,000 BAG clones of 100,000 bp length need to be analyzed to 
map the hmnan genome. Using conventional techniques mapping two restriction sites would 
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require at least 300,000 bacterial cultures aiid DNA isolations, as well as 600,000 restriction 
digestions and size separations. 

4. Mapping by STS amplification 

[0032] Sequence tagged sites are sequences, often from the 3' untranslated 
portions of nxRNA, that can be uniquely amplified in the genome. High-throughput methods 
employing sophisticated equipment have been devised to screen for the presence of tens of 
thousands of STSs in tens of thousands of clones. Two clones overlap to the extent that they 
share conamon STSs. 

C. DNA Sequencing Reactions 

[0033] DNA sequencing is tlie most important analytical tool for understanding 
the genetic basis of living systems. The process involves determining the positions of each of 
the four major nucleotide bases, adenine (A), cytosine (C), guanine (G), and thymine (T) 
along the DNA molecule(s) of an organism. Short sequences of DNA are usually determined 
by creating a nested set of DNA fragments that begin at a unique site and terminate at a 
plurality of positions comprised of a specific base. The fragments terminated at each of the 
four natural nucleic acid bases (A, T, G and C) are then separated according to molecular size 
in order to determine the positions of each of the four bases relative to the unique site. The 
pattern of fragment lengths caused by strands that tenninate at a specific base is called a 
"sequencing ladder." The inteipretation of base positions as the result of one experiment on a 
DNA molecule is called a "read." There are different methods of creating and separating the 
nested sets of terminated DNA molecules. 

1. Maxim-Gilbert method 

[0034] The Maxim-Gilbert method uivolves degrading DNA at a specific base 
using chemical reagents. The DNA strands terminatmg at a particular base are denatured and 
electfophoresed to detennine the positions of tlie particular base. The Maxim-Gilbert method 
involves dangerous chemicals, and is time- and labor- intensive. It is no longer used for most 
applications. 

2. Sanger method 

[0035] The Sanger sequencing method is currently the most popular format for 
sequencing. It employs single-stranded DNA (ssDNA) created using special vkuses like 
M13 or by denaUiring double-stranded DNA (dsDNA). An oligonucleotide sequencing 
primer is hybridized to a unique site of the ssDNA and a DNA polymerase is used to 
synthesize a new strand complementary to the original strand using all four 
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deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and small amoimts of 
one or more dideoxyribonucleotide triphosphates (ddATP, ddCTP, ddGTP, and/or ddTTP), 
which cause tennination of synthesis. The DNA is denatured and electrophoresed into a 
"ladder" of bands representing the distance of the termination site from the 5' end of the 
primer. If only one ddNTP (e.g., ddGTP) is used only those molecules that end with guanine 
will be detected in the ladder. By using ddNTPs with four different labels all four ddNTPs 
can be incorporated in the same polymerization reaction and the molecules ending with each 
of the four bases caa be separately detected after electrophoresis in order to read the base 
sequence. 

[0036] Sequencing DNA that is flanlced by vector or PGR primer DNA of known 
sequence, can undergo Sanger teimination reactions initiated from one end using a primer 
complementaiy to those Icnown sequences. These sequencing primers are inexpensive, 
because the same primers can be used for DNA cloned into the same vector or PGR amplified 
using primers with concunon terminal sequences. Commonly-used electrophoretic techniques 
for separating the dideoxyribonucleotide-terminated DNA molecules are limited to resolving 
sequencing ladders shorter than 500 - 1000 bases. Therefore only the first 500 - 1000 
nucleic acid bases can be "read" by this or any other method of sequencing the DNA. 
Sequencing DNA beyond the first 500 - 1000 bases requires special techniques. 
3. Other base-specific termination methods 

[0037] Other tennination reactions have been proposed. One group of proposals 
involves substituting thiolated or boronated base analogs that resist exonuclease activity. 
After incorporation reactions very similar to Sanger reactions a 3' to 5' exonuclease is used 
to resect the synthesized strand to tlie point of the last base analog. These methods have no 
substantial advantage over the Sanger method. 

[0038] Metliods have been proposed to reduce the number of electrophoretic 
separations required to sequence large amounts of DNA. These include multiplex sequencing 
of large numbers of different molecules on the same electrophoretic device, by attaching 
unique tags to different molecules so that they can be separately detected. Goimnonly, 
different fluorescent dyes are used to multiplex up to 4 different types of DNA molecules in a 
single electrophoretic lane or capillary (U.S. Patent No. 4,942,124). Less commonly, the 
DNA is tagged Avith large number of different nucleic acid sequences during cloning or PGR 
amplification, and detected by hybridization (U.S. Patent No. 4,942,124) or by mass 
spectrometry (U.S. Patent No. 4,942,124). 
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[0039] In principle, the sequence of a short fragment can be read by hybridizing 
different oligonucleotides with the unknown sequence, followed by deciphering the 
information to reconstruct the sequence. This "sequencing by hybridization" is limited to 
fragments of DNA < 50 bp in length. It is difficult to amplify such short pieces of DNA for 
sequencing. However, even if sequencing many random 50 bp pieces were possible, 
assembling the short, sometimes overlapping sequences into the complete sequence of a large 
piece of DNA would be impossible. The use of sequencing by hybridization is currently 
limited to resequencing, that is testmg the sequence of regions that have abready been 
sequenced. 

D. Preparing DNA for determiuing long sequences 

[0040] Because it is currently very difficult to separate DNA molecules longer 
than 1000 bases with single-base resolution, special methods have been devised to sequence 
DNA regions witliin larger DNA molecules. The "primer walking" method initiates the 
Sanger reaction at sequence-specific sites within long DNA, However, most emphasis is on 
methods to amplify DNA m such a way that one of the ends originates from a specific 
position within the long DNA. molecule. 

1. Primer walking 

[0041] Once part of a sequence has been determined (e.g., the terminal 500 
bases), a custom sequencing primer can be made that is complementary to the known part of 
the sequence, and used to prime a Sanger dideoxyribonucleotide termination reaction that 
extends further into the unknown region of the DNA. This procedure is called "primer 
walldng." The requirement to synthesize a new oligonucleotide every 400 - 1000 bp makes 
this method expensive. The method is slow, because each step is done in series rather than in 
parallel. In addition each new primer has a significant failure rate until optimum conditions 
are determined. Primer walking is primarily used to fill gaps in the sequence that have not 
been read after shotgun sequencing or to complete the sequencing of small DNA firagments 
<5,000 bp in length. However, WO 00/60121 addi-esses using a single synthetic primer for 
PGR to genome walk to unlcnown sequences from a known sequence. The 5 '-blocked primer 
anneals to the template and is extended, followed by coupling to the extended product of a 3'- 
blocked ohgonucleotide of loiown sequence, thereby creating a single stranded molecule 
having had only a single region of Icnown target DNA sequence. By sequencing an amplified 
product from the extended product having the coupled 3 '-blocked oligonucleotide, the 
process can be applied reiteratively to elucidate consecutive adjacent unknown sequences. 
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2. PGR amplification 

[0042] PGR can be used to amplify a specific region within a large DNA 
molecule. Because the PGR primers must be complementary to the DNA flanking the 
specific region, this method is usually used only to prepare DNA to "resequence" a region of 
DNA. 

3. Nested deletion and transposon insertion 

[0043] As described in above, cloning or PGR amplification of long DNA with 
nested deletions brought about by nuclease cleavage or transposon insertion enables ordered 
libraries of DNA to be created. Wlien exonuclease is used to progressively digest one end of 
the DNA there is some control over the position of one end of the molecule. However the 
exonuclease activity cannot be controlled to give a narrow distribution in molecular weights, 
so typically the exonuclease-treated DNA is separated by electrophoresis to better select the 
position of tlie end of the DNA samples before cloning. Because transposon insertion is 
nearly random, clones containing inserted elements have to be screened before choosing 
which clones have the insertion at a specific intemal site. The labor-intense steps of clone 
screening make these methods impractical except for DNA less than about 10 kb long. 

4. Junction-fragment DNA probes for preparing ordered DNA clones 

[0044] GoUins and Weissman have proposed to use "junction-fragment DNA 
probes and probe clusters" (U.S. Patent No. 4,710,465) to fractionate large regions of 
chromosomes into ordered libraries of clones. That patent proposes to size fractionate 
genomic DNA fragments after partial restriction digestion, circularize the fragments in each 
size-fraction to fomi junctions between sequences separated by different physical distances in 
the genome, and . then clone the junctions in each size fraction. By screening all the clones 
derived from each size-fraction using a hybridization probe from a known sequence, ordered 
libraries of clones could be created having sequences located different distances from the 
known sequence. Although this method was designed to walk along megabase distances 
along chromosomes, it was never put into practical use because of the necessity to maintain 
and screen hundreds of thousands of clones from each size fraction, hi addition cross 
hybridization would be expected to yield a large fraction of false positive clones. 

5. Shotgun cloning 

[0045] The only practical method for preparing DNA longer than 5 kb for 
sequencing is subcloning . the source DNA as random fragments small enough to be 
sequenced. The large source DNA molecule is fragmented by sonication or hydrodynamic 
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shea]±ig, fractionated to select the optimum fragment size» and then subcloned into a 
bacterial plasmid or virus genome. The individual subclones can be subjected to Sanger or 
other sequencing reactions in order to determine sequences witliin the source DNA. If many 
overlapping subclones are sequenced, the entire sequence for the large source DNA can be 
determined. The advantages of shotgun cloning over the other techniques are: 1) the 
fragments are small and luiiform in size so that they can be cloned with high efficiency 
independent of sequence; 2) the fragments can be short enough that both strands can be 
sequenced using the Sanger reaction; 3) transformation and growth of many clones is rapid 
and inexpensive; and 4) clones are very stable. 
E. Genomic sequencing 

[0046] Current techniques to sequence genomes (as well as any DNA larger than 
about 5 kb) depend upon shotgun cloning of small random fragments from the entire DNA. 
Bacteria and other very small genomes can be directly shotgim cloned and sequenced. This is 
called "pure shotgim sequencing." Larger genomes are usually first cloned as large pieces 
and each clone is shotgun sequenced. This is called "directed shotgun sequencing." 
1. Pure shotgun sequencing 

[0047] Genomes up to several millions or billions of base pairs in length can be 
randomly fragmented and subcloned as small fragments. However in the process of 
fragmentation all infomiation about the relative positions of the fragment sequences in the 
native genome is lost. However this infomiation can be recovered by sequencing with 5-10- 
fold redundancy (i.e., the number of bases sequenced in different reactions add up to 5 to 10 
times as many bases in the genome) so as to generate sufficiently numerous overlaps between 
the sequences of different fragments that a computer program can assemble the sequences 
from the subclones into large contiguous sequences (contigs). However, due to some regions 
being more difficult to clone than others and due to incomplete statistical sampling, there will 
still be some regions within the genome that are not sequenced even after highly redundant 
sequencing. These unlcnown regions are called "gaps." After assembly of the shotgun 
sequences into contigs, the sequencing is "finished" by filhng in the gaps. Fmishing must be 
done by additional sequencing of the subclones, by primer walking beginning at the edge of a 
contig, or by sequencing PGR products made using prhners from the edges of adjacent 
contigs. 

[0048] There are several disadvantages to the pure shotgun strategy: 1) As the 
size of the region to be sequenced increases, the effort of assembling a contiguous sequence 
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from shotgun reads increases faster than N hiN, where N is the number of reads; 2) Repetitive 
DNA and sequencing errors can cause ambiguities in sequence assembly; and 3) Because 
subclones from the entire genome are sequenced at the same time and significant redundancy 
of sequencing is necessary to get contigs of moderate size, about 50% of the sequencing has 
to be finished before the sequence accuracy and the contig sizes are sufficient to get 
substantial information about the genome. Focusing the sequencing effort on one region is 
impossible. 

2. Directed shotgun sequencing 

[0049] The directed shotgim strategy, adopted by the Human Genome Project, 
reduces the difficulty of sequence assembly by limiting tlie analysis to one large clone at a 
time. This "clone-by-clone" approach requires four steps: 1) large-insert cloning, comprised 
of a) random fi-agmentation of die genome into segments 100,000 - 300,000 bp in size, b) 
cloning of the large segments, and c) isolation, selection and mapping of the clones; 2) 
random firagmentation and subcloning of each clone as thousands of short subclones; 3) 
sequencing random subclones and assembly of the overlapping sequences into contiguous 
regions; and 4) "fmishing" the sequence by filling the gaps between contiguous regions and 
resolving inaccuracies. The positions of the sequences of the large clones within the genome 
are determined by the mapping steps, and the positions of the sequences of the subclones are 
determined by redundant sequencing of the subclones and computer assembly of the 
sequences of individual large clones. Substantial initial investment of resources and tune are 
required for the first two steps before sequencing begins. This inhibits sequencing DNA 
fi:om different species or individuals. Sequencing random subclones is highly inefficient, 
because significant gaps exist until the subclones have been sequenced to about 7X 
redundancy. Finishmg requires "smart" workers and effort equivalent to an additional 3X 
sequencing redundancy. 

[0050] The directed shotgun sequencing method is more likely to fimsh a large 
genome than is pure shotgun sequencing. For tiie human genome, for example, the computer 
effort for directed shotgun sequencing is more than 20 times less than that required for pure 
shotgun sequencing. 

[0051] There is an even greater need to sunphfy the sequencing and finishing 
steps of genomic sequencing. In prmciple this can be done by creating ordered libraries of 
DNA, giving unifoim (rather tlian random) coverage, which would allow accurate sequencing 
with only about 3 fold redundancy and elimmate the finishing phase of projects. Current 
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methods to produce ordered libraries are impractical, because they can cover only short 
regions (-^ 5,000 bp) and are labor-intensive. 
F. Resequeuciug of DNA 

[0052] The presence of a known DNA sequence or variation of a known sequence 
can be detected using a variety of teclmiques that are more rapid and less expensive than de 
novo sequencing. These "resequencing" teclmiques are important for health applications, 
where determination of which allele or alleles are present has prognostic and diagnostic 
value. 

1. Microarray detection of specific DNA sequences 

[0053] The DNA from an mdividual human or animal is amplified, usually by 
PCR, labeled with a detectable tag, and hybridized to spots of DNA with known sequences 
bound to a surface. If the individual's DNA contains sequences that are complementary to 
those on one or more spots on the DNA array, the tagged molecules are physically detected. 
If the individual's amplified DNA is not complementary to the probe DNA in a spot, the 
tagged molecules are not detected. Microairays of different design have different 
sensitivities to the amount of tested DNA and the exact amount of sequence complementarity 
that is required for a positive result. The advantage of the microarray resequencing technique 
is that many regions of an individual's DNA can be simultaneously amphfied using multiplex 
PGR, and the mixture of amplified genetic elements hybridized simultaneously to a 
microarray having thousands of different probe spots, such that variations at many different 
sites can be simultaneously detected. 

[0054] One disadvantage to using PGR to amplify the DNA is that only one 
genetic element can be amphfied in each reaction, imless multiplex PGR is employed, in 
which case only as many as 50-100 loci can be simultaneously amplified. For certain 
applications, such as SNP (single nucleotide polymorphism) screening it would be 
advantageous to simultaneously amplify 1,000 - 100,000 elements and detect the amplified 
sequences simultaneously. A second disadvantage to PGR is that only a lunited number of 
DNA bases can be amphfied from each element (usually <2000 bp). Many applications 
require resequencing entire genes, which can be up to 200,000 bp in length. 

2. Other methods of resequencing 

[0055] Other methods such as mass spectrometry, secondary structure 
conformation polymorpliism, hgation amplification, primer extension, and target-dependent 
cleavage can be used to detect sequence polymorphisms. All of these methods either require 
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initial amplification of one or more specific genetic elements by PGR or incorporate other 
forms of amplification that have the same deficiencies of PGR, because they can amplify only 
a very limited region of the genome at' one time. 

SUMMARY OF THE INVENTION 

[0056] A skilled artisan recognizes, based on the teachings provided herein, that 
deficiencies of existmg methods for amphfication of unknown DNA adjacent to known 
sequence can be solved by using nick ti'anslate molecule libraries. More particularly, the 
present uivention teaches generating a Ubrary of nick translate molecules to amphfy and 
sequence for the purpose of obtaining successive overlapping sequences firom a plurality of 
nick translate molecules. 

[0057] In an object of the present invention, the primary PENTAmer library, in a 
specific embodiment, is prepared in vitro firom bacterial or human genome using the 
teachings provided herein. 

[0058] In another object of the present invention, the primary PENTAmer library 
generated in vitro from a genome, such as from a bacteria or human, is amplified more than 
about 1000 times without any significant change in representation of the specific PENTAmei: 
amplicons. 

[0059] In an additional object of the present invention, a primary PENTAmer 
library (directly or after amplification), such as fi-om a bacteria or human, is used to amplify a 
specific PENTAmer or a PENTAmer sub-pool preferably using only one sequence-specific 
primer, which generates templates that reproducibly produce high quality sequencing data. 
Typically, the methods described herein allow systematically generating firom about 550 to 
750 bases of a new sequence located downstream the primer. 

[0060] In another object of the present invention, a primary eukaryotic (human) 
PENTAmer library (directly or after amplification) is used to amphfy a specific PENTAmer 
or a PENTAmer sub-pool using two (or more) nested sequence-specific primers. 

[0061] In an additional object of the present invention, a circularized eukaryotic 
(human) PENTAmer library is used to amphfy a specific PENTAmer or a PENTAmer sub- 
pool ushig inverse PGR and two (or more) sequence-specific primers. 

[0062] The present invention utilizes a hbrary of nick translate molecules as a 
means, to walk along a chromosome. A skilled artisan recognizes that the terms "walk," 
^Valking," "chromosome walking," or "genome wallcing" are directed to the generation of 
vmknown sequence from a sample nucleic acid, such as a genome, in a sequential manner by 
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starting from a known sequence, in specific embodiments termed herein as a **kemel," 
sequencing by a first sequencing reaction (called a "read"), and generating a second 
sequencing read from a region of sequence obtained in the first read. Thus, the two reads will 
overlap to some extent, and a consecutive series of such reactions results in the preferred 
walking embodiment of the invention. 

[0063] A skilled artisan is cognizant that any method to make an amphfiable nick 
translate molecule for chromosome walking is within the scope of the present invention. A 
skilled artisan also recognizes that, in a preferred method, tlie amplifiable nick translate 
molecule is generated by methods comprising at least firagmenting a DNA sample; attaching 
an adaptor to one end of the fragmented molecules, such as by covaleiit attachment, wherein 
the adaptor comprises a nick; nick translating with a DNA polymerase having 5'— >3' 
polymerase activity and 5'— »3' exonuclease activity; and attaching a second adaptor to the 
other end of the nick translated product. The nick translate molecule may be amplified by 
primer sequences for the adaptors. Although the nick is preferably generated by an adaptor 
comprising more than one oligonucleotide, wherein the oUgonucleotide assembly has a nick 
between them, a skilled artisan recognizes that the nick may be generated by any standard 
means in the art. 

[0064] The following definitions are provided to assist in understanding the nature 
of the invention. 

[0065] The term "nick translate molecule" as used herein refers to nucleic acid 
molecules produced by coorduiated 5'— >3' polymerase activity, such as DNA polymerase, 
and 5'— ^3' exonuclease activity. The two activities can be present within on enzyme 
molecule (such as DNA polymerase I or Taq DNA polymerase). In a preferred embodiment, 
they have adaptor sequences at their 5' and 3' tennini. 

[0066] The term "nick translation" as used herein refers to a coupled 
polymerization/degradation process that is characterized by a coordinated 5'— >3' DNA 
polymerase activity and a 5 '-^3' exonuclease activity. 

[0067] The tenn "paitial cleavage" as used herein refers to the cleavage by an 
endonuclease of a controlled fraction of the available sites within a DNA template. The 
extent of partial cleavage can be controlled by, for example, limiting the reaction time, the 
amount of enzyme, and/or reaction conditions. 

[0068] hi an object of tlie present invention, there is a method of producing a 
consecutive overlapping series of nucleic acid sequences fi'om a DNA sample, comprising the 
steps of generating a first amplifiable nick translation product, wherein said nick translation 
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of said first amplifiable nick translation product initiates from a known nucleic acid sequence 
in the DNA sample; determining at least a partial sequence from said first nick translation 
product; and generating at least a second amplifiable nick translation product, wherein said 
nick translation of said second amplifiable nick translation product initiates from the partial 
sequence of said first nick translation product. 

[0069] In another object of the present invention there is a method of producing a 
library of consecutive overlapping series of nucleic acid sequences from a DNA sample 
comprising DNA molecules having a region comprising a known nucleic acid sequence, the 
method , comprising the steps of digesting DNA molecules of the DNA sample with a first 
sequence-specific endonuclease to generate a plurality of DNA fragments; generating a first, 
amplifiable nick translation product, wherein said nick translation of said first amplifiable 
nick translation product initiates from the known nucleic acid sequence; detemoining at least 
a partial sequence from said first nick translation product; and generating one or more 
additional amplifiable nick translation products, wherein said nick translation of said one or 
more amplifiable nick translation products initiates from the partial sequence of a previous 
nick translation product. In a specific embodiment, the method fiuiher comprises the step of 
digesting DNA molecules with at least a second sequence-specific endonuclease, wherein the 
preceding overlapping nick translation product is generated from a DNA fragment fi:om 
digestion with the first sequence-specific endonuclease or from digestion with the second 
sequence-specific endonuclease. 

[0070] In an additional embodiment of the present invention, there is a method of 
producing a library of consecutive overlapping series of nucleic acid sequences, comprising 
the steps of obtaining a DNA sample comprising DNA molecules having a region comprising 
a known nucleic acid sequence; partially cleaving the DNA molecules with a sequence- 
specific endonuclease to generate a pluraUty of DNA ends; separating the cleaved DNA 
molecules; generating a first ampUfiable nick translation product, wherein said nick 
translation of said first amplifiable nick translation product initiates from a known nucleic 
acid sequence; determining at least a partial sequence from said first nick translation product; 
and generating one or more ampUfiable nick translation products, wherein said nick 
translation of said one or more amplifiable nick translation products initiates from the partial 
sequence of a previous nick translation product. In a specific embodiment, the sq)aration of 
the cleaved DNA molecules is according to size. In another specific embodiment, the size 
separation is by gel size fractionation. In an additional specific embodiment, the nick 
translation products are ampUfied. 
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[0071] Jn another specific embodiment, the amplification of the nick translation 
product comprises polymerase chain reaction utilizing a first primer specific to a known 
sequence in the nick translation product and a second primer specific to an adaptor sequence 
of the nick translation product. In an additional specific embodiment, at least one of the nick 
translation products is selectively amplified firom the pluraUty of nick translation products. In 
a further specific embodiment, the nick translation product is single stranded. In an 
additional specific embodiment, the partial cleavage of the DNA molecules comprises 
cleaving for a selected time with a frequently cutting sequence-specific endonuclease, 
wherein the sequence-specificity of the endonuclease is to three or four nucleotide bases. 

[0072] In another specific embodiment, the partial cleavage of the DNA 
molecules comprises subjecting the DNA molecules to a methylase prior to subjection to a 
methylation-sensitive sequence-specific endonuclease. In a fiulher specific embodiment, the 
selective amplification comprises introducing to said pluraUty of nick translation products a 
plurality of primers, wherein the primers comprise nucleotide base sequence complementary 
to an adaptor sequence in the nick translation product; an additional variable 3' terminal 
nucleotide; and a label; hybridizing the primers to their complementary nucleic acid 
sequences in the adaptor to form a mixture of primer/nick translate molecule hybrids; and 
extending firom a primer having the 3' terminal nucleotide complementary to the nucleotide 
in the nick translate molecule immediately adjacent to the adaptor sequence, wherein the 
hybridizing and extending steps form a mixture of unextended primer/nick translate molecule 
hybrids and extended primer molecule/nick translate molecule hybrids. 

[0073] In a specific embodiment, the method further comprises binding of the 
mixture by the label to a support; washing the support-bound mixture to remove the nick 
translate molecules; removing the support-boimd extended molecule from the support. In an 
additional specific embodiment, the primer further comprises two or more variable 3' 
terminal nucleotides. In another specific embodiment, the method further comprises 
separating the nick translate molecules by size. In an additional specific embodiment, the 
size separation is by gel fractionation. In another specific embodiment, the method further 
comprises a step of subjecting the size-separated nick translate molecules to an additional 
amplification step. In a specific embodiment, the selective amphfication step is by 
suppression PCR. In an additional specific embodiment, the suppression PGR utilizes a 
primer comprising a nucleic acid sequence for a primer specific for an adaptor sequence of 
the nick translate molecule; and nucleic acid sequence complementary to a region in a 



21 



wo 02/103054 



PCT/USOl/44970 



plurality of nick translate molecules, whereby the nucleic acid sequence is S' to the sequence 
for a primer specific for an adaptor sequence of the nick translate molecule. 

(0074] In an object of the present invention, in the method the at least one nick 
translate molecule is amplified by primer extension/ligation reactions. In a fiirther specific 
embodiment, the method fiirther comprises immobiUzation of the nick translation molecules 
onto a soUd support. In a specific embodiment, the solid support is a magnetic bead. In 
another specific embodiment, the primer extension/Ugation reactions comprise initiating and 
extending the primer extension reaction with a first primer which is complementary to 
sequence in a subset of the plurality of nick translate molecules, wherein the complementary 
sequence of the nick translate molecule is adjacent to a first adaptor end of the nick translate 
molecule; and ligating an oligonucleotide to the 5' end of the extension product, wherein the 
ohgonucleotide comprises sequence complementary to the first ad^tor of the nick translate 
molecule and also comprises a sequence for binding by a second primer, wherein the second 
primer binding sequence in the ohgonucleotide is 5' to the first adaptor complementary 
sequence in the oligonucleotide. In a fiirther specific embodiment, the method finrther 
comprise amplifying the primer extended molecule. In another specific embodiment, the 
method fiirther comprises separating the primer extended molecule &om the plurality of nick 
translate molecule. 

[0075] In an additional specific embodiment, the nick translate molecules w^ 
generated in the presence of dU nucleotides, the primer extended molecule contains no dU 
nucleotides, and wherein the separating step comprises degradation of the plurality of nick 
translate molecules by dU-glycosylase. In another specific embodiment, the amplification 
step comprises polymerase chain reaction using the second primer and a primer 
complementary to a second adaptor of the nick translate molecule. In a fiirther specific 
embodiment, the ligation/primer extension reactions comprise ligating in a head-to-tail 
orientation a plurality of oligonucleotides to form an oligonucleotide assembly, wherein the 
oligonucleotides are complementary to nick translate molecule sequence adjacent to a first 
adaptor end of the nick translate molecule and wherein the nick translate molecule sequence 
is present in a subset of the plurality of nick translate molecules, wherein the nick translation 
molecule has the first adaptor on one terminal end and a second adaptor on the other terminal 
end; initiating and extending the primer extension reaction with the 3' end of the 
oligonucleotide assembly; and hgating an ohgonucleotide to the S' end of the extension 
product, wherein the oligonucleotide comprises sequence compl^entary to the first ad^tor 
of the nick translate mdlecule and also comprises sequence for binding by a first primer, 
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wherein the first primer binding sequence is 5' to the first adaptor complementary sequence 
in the oligonucleotide. 

[0076] In another specific embodiment, the method fiuther comprises the steps of 
separating the primer extended molecule from the plurality of nick translate molecules; and 
ampUfying the primer extended molecule. In an additional specific embodiment, the nick 
translate molecules were generated in the presence of dU nucleotides, the primer extended 
molecule contains no dU nucleotides, and wherein the separating step comprises degradation 
of the plurality of nick translate molecules by dU-glycosylase. In another specific 
embodiment,, the amphfication step comprises polymerase chain reaction using the first 
primer and a second primer complementary to the second adaptor of the nick translate 
molecule. In an additional specific embodiment, the primer extension/ligation reaction 
comprises initiating and extending the primer extension reaction with a first primer which is 
complementary to sequence in a subset of the plurality of nick translate molecules, wherein 
the nick translate molecule sequence is adjacent to a first adaptor end of fiie nick translate 
molecule; and ligating an oligonucleotide to the 5' end of the extension product, wherein the 
oligonucleotide comprises sequence complementary to the first adaptor of the nick translate 
molecule; sequence for binding by a second primer, wherein the second primer binding 
sequence is S ' to the sequence in (1); and a label at the S ' end 

[0077] In an additional specific embodiment, the method fiirfher comprises the 
steps of separating the primer extended molecule from the plurality of nick translate 
molecules by the label of the oUgonucleotide; and ampUfying the primer extended molecule. 

[0078] In a specific embodiment, the label is biotin. In another specific 
embodiment, the separation fiirther comprises streptavidin-coated magnetic beads. In a 
fiirther specific embodiment, the amphfication step comprises polymerase chain reaction 
using the second primer and a third primer complementary to a second adaptor of the nick 
translate molecule. 

[0079] In an additional object of the present invention there is a method of 
sequencing nucleic acid, comprising the steps of obtaining a DNA sample comprising DNA 
molecules having a region comprising a known nucleic acid sequence; partially cleaving the 
DNA molecules with a sequence-specific endonuclease to generate a pliuraUty of DNA ends; 
sepamting the cleaved DNA molecules; generating a first ampUfiable nick translation 
product, wherein the first amplifiable nick translation product comprises an adaptor at each 
end, wherem the nick translation of said first amplifiable nick translation product initiates 
from a known nucleic acid sequence; determining at least a partial sequence from said first 
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nick translation product; and generating one or more additional amplifiable nick translation 
products, wherein said nick translation of said one or more additional amplifiable nick 
translation products initiates from the partial sequence of a previous nick translation product; 
and sequencing the nick translation products, wherein the amplified nick translation product 
is not subjected to cloning prior to the sequencing reaction. In a specific embodiment, the 
DNA sample is a genome. In another specific embodiment, there is a limited amount of 
DNA sample. In an additional specific embodiment, the amphfication is by polymerase 
chain reaction, and one of the primers for the polymerase chain reaction is used as a primer 
for the sequencing reaction. In a further specific embodiment, at least a portion of the 
adaptor sequence is removed from the amplified nick translation molecule. In another 
specific embodiment, the removal step comprises subjecting the amplified nick translation 
molecule to a 5' exonuclease. In an additional specific embodiment, a region of the adaptor 
sequence of the nick translate molecule comprises a dU nucleotide and the removal 
comprises degradation by dU-glycosylase. In a fiirther specific embodiment, a region of the 
ad^tor sequence comprises a ribonucleotide and the removal comprises degradation by 
alkaline hydrolysis. In an another specific embodiment, the region of the second adaptor 
sequence is in a 3 ' region of the second adaptor sequence. 

[0080] In an additional object of the present invention, there is a method of 
providing sequence for a gap in a genome sequence, comprising the steps of obtaining a DNA 
santiple of the genome comprising DNA molecules having a region comprising a known 
nucleic acid sequence adjacent to the gap; digesting the DNA molecules with a plurality of 
sequence-specific endonucleases to generate a plurality of DNA ends; generating a first 
ampUfiable nick translation product, wherein said nick translation of said first amplifiable 
nick translation product initiates from the known nucleic acid sequence; determining at least 
a partial sequence from said first nick translation product; and generating one or more 
additional amplifiable nick translation products, wherein said nick translation of said one or 
more amplifiable nick translation products initiates from the partial sequence of a previous 
nick translation product, wherein at least one of the amplifiable nick translation products 
comprises sequence of the gap. In a specific embodiment, the genome is a bacterial genome. 
In a specific embodiment, the genome is a plant genome. In a specific embodiment, the 
genome is an animal genome. In a specific embodiment, the animal genome is a human 
genome. In an additional specific embodiment, the bacteria are unculturable. In an 
additional specific embodiment, the bacteria is present in a plurality of bacteria. 
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[0081] In an additional object of the present invention, there is a method of 
producing a library of consecutive overlapping series of nucleic acid sequences from a DNA 
sample, comprising the steps of obtaining the DNA sample comprising a DNA molecule; 
digesting the DNA molecule with a jSxst sequence-specific endonuclease to generate a 
pluraUty of DNA jfragments, wherein at least one DNA firagment has a region comprising a 
known nucleic acid sequence; attaching a first adaptor molecule to ends of the DNA 
firagments to provide a nick translation kdtiation site, wherein the first adaptor comprises a 
label; subjecting the first adaptor-bound DNA firagment to nick translation comprising DNA 
polymerization and 5'-3' exonuclease activity, wherein the nick translation initiates firom the 
known nucleic acid sequence, to generate a first nick translation product; isolating the nick 
translation product by the label; attaching a second adaptor molecule to the first nick translate 
product; determining at least a partial sequence firom tibie first nick translation product; and 
generating one or more additional amplifiable nick translation products, wherein said nick 
* translation of said one or more amplifiable nick translation products initiates from the partial 
sequence of a previous nick translation product. In a specific embodiment, the label is biotin 
and the isolation step is binding to streptavidin-coated magnetic beads. 

[0082] In another object of the present invention, there is a method of producing a 
library of consecutive overlapping series of nucleic acid sequences, comprising the steps of 
obtaining a DNA sample comprising DNA molecules having a region comprising a known 
nucleic acid sequence; partially cleaving the DNA molecules with a sequence-specific 
endonuclease to generate a plurality of DNA fragments, wherein at least one DNA fragment 
has a region comprising a known nucleic acid sequence; separating the cleaved DNA 
fragments; attaching a first adaptor molecule to ends of the DNA fragments to provide a nick 
translation initiation site, wherein the first adaptor comprises a label; subjecting the first 
adaptor-bound DNA fragment to nick translation comprising DNA polymerization and 5'-3' 
exonuclease activity, wherein the nick translation initiates from the known nucleic acid 
sequence, to generate a first nick translation product; isolating the nick translation product by 
the label; attaching a second adaptor molecule to the first nick translate products; determining 
at least a partial sequmce from said first nick translation product; and generating one or more 
additional amplifiable nick translation products, wherein said nick translation of said one or 
more amplifiable nick translation products initiates from the partial sequence of said first nick 
translation product In a specific embodiment, the separation of the DNA fragments is by 
size. In another specific embodiment, the size separation is by electrophoresis. 
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[0083] In another object of the present invention, there is a library of consecutive 
overlapping series of nucleic acid sequences jfrom a DNA sample, wherein the library is 
generated by the methods described herein. 

BRIEF DESCRIPTION OF TBDE FIGURES 

[0084] The following drawings fonn part of the present specification and are 
included to further demonstrate certain aspects of the present invention. The invention may 
be better imderstood by reference to one or more of these drawings in combination with the 
detailed description of specific embodiments presented herein. 

[0085] FIG. 1 illustrates genome walking by sequential amplification of the 
overlapping PENTAmers. 

[0086] FIG. 2 demonstrates types of PENTAmer libraries. 

[0087] FIGS. 3A and 3B illustrate the general strategy of genome walking by a 
targeted amplification of the overlapping PENTAmers. 

[0088] FIGS. 4A and 4B illustrate synthesis of the primary PENTAmer library 
fi-om a genomic DNA completely digested with a restriction endonuclease. 

[0089] FIGS. 5A and 5B show synthesis of the primary PENTAmer library firom a 
partially digested genomic DNA. 

[0090] FIG. 6 demonstrates premature termination of the PENTAmer synthesis on 
short DNA fi'agments. 

[0091] FIG. 7 illustrates amplification of the PENTAmer library produced by a 
partial restriction digestion using conventional PCR. 

[0092] FIGS. 8A and 8B show one-base selection by primer-extension/affinity 
capture procedure. 

[0093] FIG. 9 demonstrates reducing the PENTAmer library complexity by 
primer extension/polymerase chain reaction with primer-selector A. 

[0094] FIG. 10 illustrates genome walking using overlapping PENTAmer Hbrary, 
conventional PCR, and DNA size fractionation-pooling strategy. 

[0095] FIG. 1 1 illustrates amplification of the PENTAmer Ubrary produced by a 
partial restriction digestion using suppression PCR. 

[0096] FIG. 12 illustrates preparation of the immobilized single-strand 
complementary PENTAmer library for the selection-amplification procedure. 

[0097] FIGS. 13A and 13B shows targeted PENTAmer amplification by primer 
extension-ligation-Method I. 
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[0098] FIGS. 14A and 14B demonstrates targeted PENTAmer amplification by 
modular oligonucleotide assembly-Method n. 

[0099] FIGS. 15A and 15B demonstrates targeted PENTAmer amplification by 
modular oligonucleotide assembly-Method HI. 

[0100] FIGS. 16A and 16B demonstrates PENTAmer selection by primer 
extension/ligation followed by magnetic bead capture. 

[0101] FIG. 17 shows sequencing of two overlapping firagments L and S 
generated by amplification of PENTAmer library (following partial restriction digestion) 
using unique primer P and universal primer B. 

[0102] FIG. 18 illustrates sequencing gaps in a genome, such as a bacterial 
genome, using primary PENTAmer libraries. 

[0103] FIG. 19 demonstrates positional genome walldng by targeted PENTAmer 
amplification. 

[0104] FIG. 20 demonstrates PGR amplification of genomic BamJi I PENTAmer 
R coli library and selected kernel sequences. 

[0105] FIG. 21 illustrates schematic presentation of assembly of short 
oligonucleotides on E. coli BamH I PENTAmer library template. 

[0i06] FIG. 22 demonstrates assembly of short oligonucleotides at specific E, coli 
genomic kernel sequence by thermo-stable DNA ligase using secondary jB. coli genomic 
BamH I PENTamer library as template. 

[0107] FIG. 23 shows selection of specific E, coli PENTAmer sequence by 
assembly of short oligonucleotides followed by extension with DNA polymerase and ligation 
of universal adaptor oligonucleotide at adaptor A using secondary E. coli genomic BamB. I 
PENTAmer library as template. 

[0108] FIG. 24 demonstrates PGR analysis of forty kemel sites in primary 
PENTAmer library firom E, coli Sau3A I partial genomic digest. 

[0109] FIG. 25 shows PGR analysis of two kemel sites in PENTAmer library 
firom E. coli Sau3A 1 partial genomic digest after size separation. 

[0110] ' FIG. 26 demonstrates PGR analysis of three kemel sequences selected by 
multiplexed linear ampUfication firom secondary E. coli PENTAmer library derived firom 
Sau3A I partial digest. 

[0111] FIG. 27 shows PGR amplification of PENTAmer libraries prepared from 
human genomic DNA after partial SauSA I or complete BamK I restriction digest. 
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[0112] PIG. 28 shows ciicularization of single-stranded human genomic DNA 
SauSA I PENTAmer library. 

[0113] FIG. 29 demonstrates PGR amplification of single-stranded circular jSaw3 A 
I human PENTAmer library and a kemel sequence. 

[0114] FIG. 30 shows nested PGR amplification of kemel human genomic 
sequence from primary BamH I and Sau3A I PENTAmer hbraries. 

[0115] FIG. 31 illustrates schematic presentation of regions in the 10 Kb human 
tp53 gene amplified by nested PGR from primary BamH I and Sau3A I libraries. 

[0116] Other objects, features and advantages of the present invention will 
become apparent from the following detailed description. It should be understood, however, 
that the detailed description and the specific examples, while indicating preferred 
embodiments of the invention, are given by way of illustration only, since various changes 
and modifications within the spirit and scope of the invention will become apparent to those 
skilled in the art fix>m this detailed description. 

DETAILED DESCRIPTION OF THE INVENTION 

[0117] This application herein incorporates by referrace in its entirety United 
States Application Serial No. 09/860,738 filed May 18, 2001. 

[0118] As used herein the specification, "a" or "an" may mean one or more. As 
used herein in the claim(s), when used in conjunction with the word "comprising", the words 
"a" or "an" may mean one or more than one. As used herein "another** may mean at least a 
second or more. As used herein, the term "nick translate molecule" is used interchangeably 
with the terms "PENTAmer" or "nick translate product." 

L Generation of a Nick Translate Molecule 

[0119] The present invention is directed to chromosome walking through the 
generation of nick translate molecules, and a sldlled artisan recognizes that the nick translate 
molecules may be generated by any standard means in the art. However, in a preferred 
embodiment, the nick translate molecules are adaptor attached nick translate molecules 
(designated a PENTAmer). 

[0120] The method for creating an adaptor attached nick translate molecule 
provides a powerful tool usefiil in overcoming many of the difficulties currentiy faced in 
large scale DNA manipulation, particidarly genomic sequencing. 
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A. Primary PENTAmer. 

[0121] In the simplest implementation, a primary PENTAmer is generated by: 

[01221 1) Ligating a nick-translation first adaptor to the proximal end of the 
source DNA (the template); 

[0123] 2) Initiating a nick translation reaction at the nick site of said adaptor 
iising a DNA polymerase having 5'->-3 ' exonuclease activity; 

[0124] 3) Elongating the PENT product a specific time; and 

[0125] 4) Appending a nick-ligation second adaptor to the distal, 3' end of the 
PENT product to fonn a PENTAmer-template hybrid ('^nascent PENTAmer"). 

[0126] While this basic technique sets forth the primary methodology envisioned 
by the inventors to create a PENTAmer product, it would be clear to one of ordinary skill that 
changes could be made in order to achieve an analogous outcome. 

' [0127] In a specific embodiment, the PENT reaction is initiated, continued, and 
terminated on a largely double-stranded template, which gives the PENTAmer amplification 
important advantages for creating DNA for sequence analysis. An advantage of using 
PENTAmers to amplify difG^ent regions of the template is the fact that in most applications 
PENTAmers having difTereht internal sequences have the same terminal sequences. These 
advantages are important for creating PENTAmers that are most usefiil as intermediates for 
in vitro or in vivo amplification. Amplification of these intermediates is more usefixl than 
direct ampUfication of DNA by cloning or PGR. 

[0128] During later steps, the PENTAmers can be degraded by incorporating 
distinguishable nucleotides during the reaction. For example, incorporation of dU 
nucleotides and subsequent exposure to dU-glycosylase allows destruction of the 
PENTAmers for separation fi*om, for example, a desired nucleic molecule lacking the dU 
nucleotides. 

[0129] The initiation site for a PENT reaction (as distinct firom an oUgonucleotide 
primer) can be introduced by any method that results in a firee 3' OH group on one side of a 
nick or gap in ofiierwise double-stranded DNA, including, but not limited to such groups 
introduced by: a) digestion by a restriction enzyme under conditions that only one strand of 
the double-stranded DNA template is hydrolyzed; b) random nicking by a chemical agent or 
an endonuclease such as DNAase I; c) nicking by fl gene product n or homologous enzymes 
firom other filamentous bacteriophage (Meyer and Geider, 1979); and/or d) chemical nicking 
of the template directed by triple-helix fomiation (Grant and Dervan, 1996). 
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[0130] However, for PENTAmer synthesis, the primary means of initiation is 
through the Ugation of an oligonucleotide primer onto the target nucleic acid. This very 
powerful and general method to introduce an initiation site for strand replacement synthesis 
employs a panel of special double-stranded oUgonucleotide adaptors designed specifically to 
be ligated to the termini produced by restriction enzymes. Each of these ad^tors is designed 
such that the 3' end of the restriction jfragment to be sequenced can be covalently joined 
(ligated) to the adaptor, but the S' end cannot. Thus the 3' end of the adaptor remains as a 
free 3' OH at a 1 nucleotide gap in the DNA, which can serve as an initiation site for the 
strand-replacement sequencing of the restriction fragment. Because the number of different 
3' and 5' overhanging sequences that can be produced by all restriction enzymes is finite, and 
the design of each adaptor will follow the same simple strategy, above, the design of every 
one of the possible adaptors can be foreseen, even for restriction enzymes that have not yet 
been identified. To facilitate sequencmg, a set of such adaptors for strand replacement 
initiation can be synthesized with labels (radioactive, fluorescent, or chenucal) and 
incorporated mto the dideoxyribonucleotide-tenninated strands to facilitate flie detection of 
the bands on sequencing gels. 

{0131] More specifically, adaptors with 5' and 3' extensions can be used in 
combination with restriction enzymes generating 2-base, 3-base and 4-base (or more) 
overhangs. The sense strand of the adaptor has a 5' phosphate group that can be efficiently 
ligated to the restriction fragment to be sequenced. The anti-sense strand (bottom, underlined) 
is not phosphorylated at the 5' end and is missing one base at the 3' end, effectively 
preventing ligation between adaptors. This gap does not interfere with the covalent joining of 
the sense strand to the restriction fragment, and leaves a free 3' OH site in the anti-sense 
strand for initiation of strand replacement synthesis. 

[0132] Polymerization may be terminated specific distances from the priming site 
by inhibitmg the polymerase a specific time after initiation. For example, under specific 
conditions Tag DNA polymerase is capable of strand replacement at the rate of 250 
bases/min, so that arrest of the polymerase after 10 min occurs about 2500 bases from the 
initiation site. Hiis strategy allows for pieces of DNA to be isolated from different locations 
in the genome, 

[0133] PENT reactions may also be terminated by incorporation of a 
dideoxyribonucleotide instead of the homologous naturally-occxming nucleotide. This 
terminates growth of the new DNA strand at one of the positions that was formerly occupied 
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by dA, dT, dG, or dC by incorporating ddA, ddT, ddG, or ddC. In principle, the reaction can 
be terminated using any suitable nucleotide analogs that prevent continuation of DNA 
synthesis at that site. 

B. Secondary PENTAmers 

[0134] Secondary PENTAmers are created by two nick-translation reactions. The 
length of the first PENT reaction determines the distance of one end of the secondary 
PENTAmer from the initiation position, whereas the second (shorter) PENT reaction 
determines the length of the secondary PENTAmer. The advantage of secondary 
PENTAmers is that the position of the PENTAmer within the template DNA and the length 
of the PENTAmer are independently controlled. 

[0135] There are two methods to synthesize a secondary PENTAmer. In the first 
method, a secondary PENTAmer is created and ampUfied by: 

[0136] Ligating a first terminus-attaching, nick translation ad^tor to the proximal 
end of the template DNA molecule; 

[0137] Initiatmg a first PENT reaction at the proximal end of the source DNA 
molecule using a first adaptor; 

[0138] Elongating the first PENT product a specified time; 

[0139] Appending a second nick-attaching adaptor to the distal, 3' end of the first 
PENT product; 

[0140] Initiating a second PENT reaction at the same proximal end of the source 
DNA molecule using the first adaptor; 

[0141] Elongating the second PENT product a specifided time; 

[0142] Appending a third nick-attaching adaptor to the 5' end of the degraded first 
PENT product; 

[0143] (Optionally) sq)arating the single-stranded secondary PENTAmer of 
length from the template (e.g., by denaturation); 

[0144] In a second method, a secondary PENTAmer is created by: 

[0145] Ligating a first tenninus-attaching, nick translation ad^qptor to the proximal 
end of the template DNA molecule; 

[0146] Initiating a first PENT reaction at the proximal end of the source DNA 
molecule using the first adaptor^ 

[0147] Elongating the PENT product a specified time; 
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[0148] Appending a second nick-attaching adaptor to the distal, 3' end of flie 
PENT product; 

[01491 Separating the single-stranded primary PENTAmer from the template; 
[0150] Replicating the second strand of the primary PENTAmer using primer 
extension; 

[0151] Initiating a second PENT reaction at the upstream end of the secondary 
PENTAmer; 

[0152] Elongating the secondary PENT product a specified time; 
[0153] Appending a third nick-attaching ads^tor to the 3' end of the secondary 
PENT product; and 

[0154] (Optionally) separating the single-stranded secondary PENTAmer from 
the template. 

C Recombinant PENTAmers 

[0155] The diflSculty of immobilizing very large DNA fragments may be 
overcome by bringing together sequences from both the proximal and distal ends of long 
templates to create a recombinant PENTAmer. 

[0156] A recombinant PENTAmer is made on a single template molecule, having 
different structures at the left (proximal) and right (distal) ends. 

[0157] 1) The first end of a recombination adaptor RA is attached to the left, 
proximal end of the template; 

[0158] 2) The second end of a recombination adaptor RA is attached to the right, 
distal end, to form a circular molecule; and 

[0159] 3) The mitiation domain of ads^tor RA is used to synthesize a 
PENTAmer containing the distal template sequences, 

[0160] PENTAmers will only be created on those firagments that have been 
ligated to both ^ds of the recombination adaptor RA. Specific designs and use of 
recombination adaptors would be apparent to a skilled artisan. One embodiment uses an 
adaptor RA comprising a first ligation domain complementary to the proximal tenninus of 
the template, an activatable second ligation domain complementary to tiie distal terminus, and 
a nick-translation initiation domain capable of translating the nick fix>m the distal end toward 
the center of the template. In the case of a recombination adaptor of that specific design, the 
template would be made resistant to cleavage by the activation restriction enzyme by 
mettiylation at the restriction recognition sites, and the second step would be executed in the 
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following way: 1) removal of unligated adaptor RA from solution, 2) activation of adaptor 
RA by restriction digestion of the unmethylated site within the adaptor, 3) dilution of the 
template, 4) ligation of the second ligation domain to the distal end of the template, and 5) 
concentration of the circularized molecules. Step 3 is executed by the same methods used to 
create a primary PENTAmer, however the nick-translation initiates at the initiation domain of 
an RA adaptor. 

[0161] The PENTAmer formed can be amplified by any of the methods described 
earlier, e,g., by PGR using primers complementary to sequences in adaptors. 
D. Adaptors 

[0162] A preferred design of a nick-translation adaptor is formed by annealing 3 
oligonucleotides (or more): oligonucleotide 1, oligonucleotide 2 and oligonucleotide 3. The 
left ends of these adaptors are designed to be ligated to double-stranded ends of template 
DNA molecules and used to initiate nick-translation reactions. Oligonucleotide 1 has a 
phosphate group (P) at the S' end and a blocking nucleotide at the 3' end, a non-specified 
nucleotide composition and length from about 10 to 200 bases. Oligonucleotide 2 has a 
blocked 3' end, a non-phosphorylated 5' end, a nucleotide sequence complementary to the 5' 
part of oligonucleotide 1 and length from about 5 to 195 bases. When hybridized together, 
oligonucleotides 1 and 2 form a double-stranded end designed to be ligated to the 3' strand at 
the end of a template molecule. To be compatible with a ligation reaction to the end of a 
DNA restriction fragment, a nick-translation adaptor can have blunt, S'-protmding or 3'- 
protruding ©ad. Oligonucleotide 3 has a 3' hydroxyl group, a non-phosphorylated 5' end, a 
nucleotide sequence complementary to the 3' part of oligonucleotide 1, and length from about 
S to 195 bases. When hybridized to oligonucleotide 1, ohgonucleotides 2 and 3 form a nick or 
a few base gap within the lower strand of the adaptor. Oligonucleotide 3 can serve as a primer 
for initiation of the nick-translation reaction. 

[0163] Other nick-attaching adaptors are partially double-stranded or completely 
single-stranded short DNA molecules that can be covalently linked to the 3' hydroxyl group 
of the nick-translation DNA product Nick-translation DNA product can be a single-stranded 
molecule isolated from its DNA template or the nick-translation product still hybridized to 
the template DNA. The nick-attaching adaptors are designed to complete the synthesis of the 
y end of PENTAmers. 

n. Chromosome walking using primary PENTAmer library-General Embodiments 
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[0164] PENTAmar walking is achieved by priming-selection and amplification of 
a limited number of PENTAmer molecules with a known sequence at their 5' end (FIG. 1). 
At every step a new DNA sequence located downstream from the primer(s) is generated. In a 
preferred embodiment, the predicted size of the amplicon guarantees the success of each 
walking step; that is, the amount of sequence information generated at each step is equal to 
the PENTAmer amplicon size (for example, 1 kb). In practice, the new sequence identified at 
each walking step is limited by existing DNA sequencing technology and usually does not 
exceed about 750 bp. To guarantee the success of the proposed walking strategy, the nick- 
translate library should be redundant to the extent that at each step the 5' end of the nick- 
translate molecule can be identified, the molecule primed, amplified and sequenced. In 
principle, one library and one amplification is necessary at each step. 

[0165] Depending on frequency of DNA cleavage with a restriction enzyme, the 
corresponding primary PENTAmer library would result in a different level of coverage of 
genomic DNA. For example, the PENTAmer library prepared from DNA fitigments after Sfi 
I and BamR I digestion will have an average of about two PENTAmer molecules per 60 kb 
and 10 kb, respectively (FIG. 2A and 2B) leaving substantial gaps between consecutive 
PENTAmer molecules (PENTAmers generated at both strands of DNA are herem considered 
separately: C- and W-PENTAmers). The PENTAmer Ubrary prepared after partial restriction 
digestion of DNA with a frequently cutting endonuclease SauSA I will have an average 8 
molecules per 1 Kb. At the size of the PENTAmer amplicon of 1 Kb, the levels of 
redundancy for those cases A, B and C shown on FIG. 2 are 0.03, 0.2 and 8, respectively. 

A. Genome walking by amplification of PENTAmers from libraries 
prepared by complete digestion with several different restriction 
endonucleases 

[0166] In this approach, several (N) nick-translate (PENTAmer) sub-libraries are 
produced from DNA obtained by a complete digestion with N different non-fi:equently 
cutting restriction enzymes Ri- R„ (FIG. 3A). Because there is no overlap between 
PENTAmers withia one sub-library, the redundancy of total coverage is achieved by 
preparing PENTAmer sub-libraries from several DNA restriction digests. 

[0167] FIGS. 4A and 4B illustrate the preparation of the primary PENTAmer 
library for a given restriction enzyme R„ presented in the following Protocol 1: 

1. Protocol 1: Preparation of the primary PENTAmer libraries by a 
complete digestion with different restriction enzymes 
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c. Split DNA into N tubes contaimng N diflFerent restriction 
enzymes and corresponding buffer, and digest to completioa The most suitable enzymes are 
the restriction endonucleases with 6-base specificity as, for example, BamB I, EcdR I, Hind 
m, etc. A skilled artisan is aware that there are more than 100 enzymes of this type currently 
available on the market. Stop the reaction by adding EDTA or/and by heating at 65-75* C. 

d. Incubate DNA samples with the alkaline phosphatase for an 
appropriate time to remove the phosphate group from all 5' DNA restriction fragments (this 
step is optional). Purify. DNA by phenol/chlorophonn extraction-ethanol precipitation or 
using commercially available DNA purification kits. 

e. Ligate the nick-translation adaptor A to all DNA ends. Purify 

DNA. 

f. Incubate with a DNA polymerase possessing 5' exonuclease 
activity (for example, non-mutated Tag DNA polymerase) for a specific time to synthesize 
DNA molecules of a controlled size (PENT products). 

g. . Isolate PENT molecules by capturing on the streptavidin- 
coated magnetic beads. 

h. Ligate the second adaptor B to the 3' ends of immobilized 

PENT molecules. 

[0168] At this point, N different primary PENTAmer sub-libraries are generated. 
The sub-libraries can be additionally amplified if necessary using imiversal primers A and B. 

[0169] FIG. 3A illustrates the case when 10 individual PENTAmer libraries 
constitute a walking nick-translate DNA library. The figure shows a DNA region covered by 
21 PENTAmer amplicons originated from the bottom C-strand of DNA. The walking process 
starts at the right end where the DNA sequence is known. The selection of the specific 
PENTAmer molecule Pn is achieved in the two steps: first, when choosing the corresponding 
sub-library Rj, for the amplification; and second, when amplifying the DNA fragment using 
sequrace-specific primer Pr(n) and universal adaptor-specific primer B. Because there is no 
overlap between PENTAmers within one sub-libraiy the exact location of the sequence- 
specific primer is not important except that it should anneal to DNA downstream the 
restriction site. 

[0170] For example, amplification and sequencing of the molecule Pi using sub- 
libraiy Ri and primers Pr 1 and B is resulted in identification of the restriction site R4 within 
the 3' end of the same molecule. At the next step, individual sub-library R4 and primes Pr 2 
and B are used to amplify and sequence the molecule P4. The restriction site R^ is identified 
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at the 3' end of the P4 DNA molecule and the molecule is amplified and sequenced using 
library and primers Pr 3 and B. As a result, a minimal tiling pafli is created by the 
sequmtial amplification and sequencing of the molecules Pi, P4, Pe, P7, Pi*, and Ps firom the 
corresponding nick-translate sub-libraries Ri, R4, R6, R7, Ru suid Rg. 

B. Genome walking by amplification of PENTAmers from libraries 
prepared by partial digestion with one frequently cutting restriction 
endonuclease 

[0171] In this case, a redundant nick-translate DNA library is prepared by a 
partial digestion of DNA with one frequently cutting restriction endonuclease R (FIG. 3B). 
The drawing shows 21 nick-translate molecules originated fi-om the bottom C-DNA strand. 

[0172] FIGS. 5A and 5B illustrate the preparation of primary PENTAmer library 
produced by a partial digestion of DNA with a restriction enzyme R presented in the 
Protocol 2: 

1. Protocol 2: Preparation of the primary PENTAmer library by a 
partial digestion with a frequently cutting restriction enzyme 

a. Digest DNA partially with a fi*equently cutting restriction 
enzyme with 4 or 3 base specificity using limited time or limited enzyme strategy, or using a 
combined restriction digestion / methylation method. A skilled artisan recognizes that there 
are many suitable enzymes, such as Sau3A I, Nla HI, Cvi J, etc. Stop the reaction. 

b. Incubate DNA samples with the alkaline phosphatase for an 
appropriate time to remove the phosphate group from all S' DNA restriction fragments (this 
step is optional). Purify DNA by phenol/chloroform extraction-ethanol precipitation or using 
commercially available DNA purification kits. 

c. Ligate the nick-translation adaptor A to all DNA ends. Purify 

DNA. 

d. Fractionate DNA by a gel electrophoresis to isolate fi:agments 
larger than double size of a PENTAmer molecules. The PENTAmers from smaller restriction 
fragments will be shorter than the expected PENTAmer size because of a premature collapse 
of two nick-tranislation reactions initiated at the opposite ends of the DNA fragments. 

e. Incubate with a DNA polymerase possessing 5' exonuclease 
activity (for example, non-mutated Tag DNA polymerase) for a specific time to synthesize 
DNA molecules of a controlled size (PENT products). 
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f. Isolate PENT molecules by capturing on the streptavidin- 
coated magnetic beads. 

g. Ligate the second adaptor B to the 3' ends of immobilized 
PENT molecules. Wash. 

[0173] The PENTAmers prepared from a partially digested DNA are substantially 
overlapped and form a highly redundant DNA library. The size fractionation step is important 
because partial digestion generates DNA molecules of all sizes with about the same 
probability. As a result, the PENTAmers from DNA fragments with the size smaller than 
double size of the expected PENTAmer ampUcon length will be shorter because of a 
premature collapse of two nick-translation reactions initiaf ed at the opposite ends of the DNA 
fragments (FIGS. 6B and 6C). 

[0174] The overlapping PENTAmer library is used to wallc along a chromosome. 
In principle, the waUdng strategy would be very similar to that described in a previous section 
if there is a way to selectively amplify individual PENTAmer molecules. As an example, 
FIG. 3B shows 21 overls^ping PENTAmer molecules from the library generated by partial 
digestion of DNA with a restriction endonuclease R (only PENTAmers from the bottom 
strands are illustrated). A minimal tiling path in this case can be created by a selective 
amplification and sequencing of the molecules Pi, P5; P9, P13, Pn and P21 from a single niclc- 
translate library R. 

[0175] As described herein, there are several ways to select and amplify a unique 
amplicon using the overlapping PENTAmer library. The present invention is also directed to 
solving the problem of sequencing complex mixtures of PENTAmers which are easy to 
generate by a conventional PGR. 

C. PCR amplification of the overlapping PENTAmer libraries 

[0176] Amplification of overlapping PENTAmers by standard PCR using one 
sequence-specific and one universal primer would result in selection and amplification of 
several molecules, specifically, a nested set of DNA firagments of different length which 
share ttie same priming site P (FIG. 7). For example, from eight overlapping PENTAmer 
molecules shown on FIG. 7 only the molecules ## 2 to 7 will serve as templates for a primer- 
extension reaction with primer P. It is not obvious that the amplified molecules ##2-7 
(FIG. 7) could be directly used for DNA sequencing using primer P (or nested primer P') as a 
sequencing primer. Two factors could potentially affect the quality and length of the resulting 
sequencing ladder. 
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[0177] First, the bias towards a prefereatial amplification of the shortest DNA 
fragments could reduce the length of DNA sequencing. 

[0178] Second, the overlap between the universal adaptor sequence at the "fuzzy" 
end of short DNA fragments and the DNA sequence of longer fragments could result in 
ambiguities in the base calling in the region of overlap. 

[0179] There are several ways to minimize the number of PENTAmers which can 
be amplified using PCR. 

1. Sequence analysis by the sub-libraries approacli 

[0180] The method reUes on the segregation of PENTAmer molecules into sub- 
fractions according to a base composition at the region adjacent to the restriction site. The 
segregation is achieved by selective priming and synthesis of DNA molecules using a set of 
biotinylated selective primers A* and universal primer B. As in an AFLP method selective 
primers are complementary to the adaptor sequence A and the restriction site plus have an 
extra selective base(es) at their 3' end. For example, four one-base selective primers shown 
on FIGS. 8A and 8B have in addition an extra G, A, T or C base at the 3 ' end. Sixteen two- 
base selective primers have two additional selective bases at the 3' end, and so on. 

[0181] The first step involves hybridization and extension of primer-selectors 
using wild type Tag DNA polymerase (FIGS. 8A and 8B). The reactions proceed in foiu: 
different tubes. 

[0182] In a second step, selected molecules are immobilized on the streptavidin 
coated magnetic beads and washed to remove the rest of DNA (FIGS. 8 A and 8B). 

[0183] The next level of selection can be achieved by cleaviag off the biotin 
moiety, releasing selected molecules into solution and repeating the selection step with a new 
set of selective primers. For example, after segregation of the PENTAmer library into 4 pools 
"G**, "A", "T", and "C" using one-base selective primers, the sub-libraries can be fiffther 
segregated into 16 pools using two-base selective primers (FIG. 9). 

[0184] Walking v^th pre-selected sub-libraries is very similar to the walking 
process described previously herein, when multiple sub-Ubraries are created by cleavage with 
multiple restriction enzymes. Amplification of a selected sub-library with standard PCR using 
one sequence-specific and one universal primer would result in selection and amplification of 
a very limited number of molecules, presumably just one (largest) amplicon. 

2. Sequence analysis by the size fractionation approach. 
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[0185] Another solution to the problem is to fractionate the molecules after PGR 
by size using gel electrophoresis or chromatography and use for sequencing only DNA 
molecules larger than, for example, about 800 bp. To reduce the number of samples for 
preparative size fractionation, the PGR products generated by different sequence-specific 
primers Pi, P2, Pn and one universal primer-adaptor B can be pooled together, size 
fractionated, aliquoted into n different tubes and re-amplified again using the same primers 
(FIG. 10). 

[0186] The molecules for size fractionation can be generated also by n primer- 
extension reactions with sequence-specific primers Pi, P2, Pn or even one multiplexed 
polymerase-extension reaction using primers P], P29 • • Pn combined together in a one tube. 

3. Sequence analysis by the suppression PCR method 

[0187] An additional approach to reduce the representation of short DNA 
fragments is to use a suppression PCR (Siebert et aL, 1995) wherem the sequence*specific 
primer PS is designed to have an additional 5' sequence which is identical to the sequence of 
the universal adaptor primer B (FIG. 1 1). The reaction is initiated by limited number of linear 
amplifications using sequence-specific suppression-PCR primer PS (FIG. 11) and completed 
by using suppression PCR mode with the universal primer B (FIG. 1 1). Because of formation 
of a specific panhandle DNA structure at the ends of DNA fragments the amplification of the 
shortest DNA fragments is suppressed and only large DNA molecules would be amplified 
(FIG. 11). Suppression PCR offers an additional level of selection, namely, selection 
according to DNA fragment size. 

4. Sequence analysis by the enzymatic pre-selectiom approach 

[0188] It is also feasible to amplify only one nick-translate DNA molecule, - 
namely, the largest molecule of the nested set shown on FIG. 7 by adding an additional 
enzymatic selection reaction. This type of selection can be achieved by targeted ligation- 
mediated amplification. The following section describes four different protocols of the 
targeted PBNTAmer amplification. However, prior to the targeted PENTAmer amplification, 
the PENTAmers are preferably immobilized and rendered single stranded, such as is 
illustrated in FIG. 12. 

a. Method 1 

[0189] FIGS. 13A and 13B show the first targeted amplification method. It 
involves four major steps. 
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[0190] Step 1. Polymerase extension reaction witti phosphoiylated primer- 
selector Px complementary to the left side of the restriction site Rx (FIG. 13A and 13B). 
Priming occurs internally within several overlapping PENTAmer molecules except 
PENTAmer X where priming occurs at the "restriction" end of the DNA fragment in the 
region immediately adjacent to the adaptor sequence A. 

[0191] Step 2. Ligation of the tagged oligonucleotide Pa to the 5' end of the 
extension product. Oligonucleotide Pa is complementary to the adaptor A, and it is ligated 
only to the terminally extended molecule on the targeted PENTAmer X (FIG. 13C). 

[0192] Step 3. Degradation of the template PENTAmer DNA library by 
incubation with dU-glycosylase, followed by heating (FIG. 13D) 

[0193] Step 4. PGR amplification using primers B and C (5' portion of the. 
tagged oligo Pa) (FIG. 13E). 

b. Method 2 

[0194] FIGS. 14A through 14E illustrate second protocol for the targeted 
amplification of PENTAmers. It has five major steps. 

[0195] Step 1. Ligation-assembly reaction using short phosphorylated 
oligonucleotides Pi, P2, P3 complementary to the left side of the restriction site Rx, 
thermostable ligase and moderate temperature. Primer assembly occurs internally within 
several overlapping PENTAmer molecules except PENTAmer X where priming occurs at the 
"restriction" end of the DNA fiagment in the region immediately adjacent to the adaptor 
sequence A (FIG. 14B). 

[0196] Step 2. Polymerase extension reaction at an elevated temperature. 

[0197] Priming occurs internally within several overlapping PENTAmer 
molecules except PENTAmer X where priming initiated terminally (FIG. 14C). 

[0198] Step 3. Ligation of the tagged oligonucleotide Pa to the 5' end of the 
extension product. Oligonucleotide Pa is complementary to the adaptor A and it is ligated 
only to the terminally extended molecule on the targeted PENTAmer X (FIG. 14D). 

[0199] Step 4. Degradation of the template PENTAmer DNA library by 
incubation with dU-glycosylase followed by heating. 

[0200] Step S. PCR amplification using primers B and C (5' portion of the tagged 
oUgoPA)(FIG. 14E). 

c. Method 3 
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[0201] FIGS. 15A through 15E show a third approach. It involves four major 

steps. 

[0202] Step 1. Ligation-assembly reaction using short phosphorylated 
oligonucleotides Pi, P2, P3 complementary to the left side of the restriction site Rx and the 
tagged oligonucleotide Pa complementary to the adaptor A DNA sequence, thermostable 
ligase and moderate temperature. Assembly of larger oligomers jfrom oligos Pi, P2, P3 occurs 
internally within several overlapping PENTAmer molecules but incorporation of the tailed 
oligo Pa occurs only at the end of the PENTAmer X (FIG. I5B) 

[0203] Step 2. Polymerase extension reaction at elevated temperature. Priming 
occurs internally within several overlapping PENTAmer molecules but only extension 
reaction with PENTAmer X as a template results in a full size product with Pa tail (sequence 
C) at the 5 ' end (FIG. 15C). 

[0204] Step 3. Degradation of the template PENTAmer DNA library by 
incubation with dU-glycosylase followed by heating (FIG. 1 5D). 

[0205] Step 4. PCR amplification using primers B and C (5' portion of the tagged 
oUgoPA) (FIG. 15E). 

[0206] The first three selection procedures suggests that: 

[0207] (a) PENTAmer molecules have a single stranded form; b) the strand 
complementary to the primary PENTAmer is used for the selection, namely, the strand 5'B 
^ 3'A (the primary PENTAmer has an opposite orientation 5'A ^ 3'B) (FIGS. 5 A and 
5B); c) molecules are immobilized through a 5'-biotin group (primer B) on the soUd support 
(magnetic beads); and d) a firaction of dT nucleotides is replaced with dU nucleotides during 
preparation of the PENTAmer library 

[0208] Conditions a) and b) are important prerequisites of protocols ##1,2 and 3 
for targeted PENTAmer amplification. Factor c) simplifies the removal of en2ymes and 
triphosphates, but it is not detrimental. Factor d) allows elimination of original templates and 
reduces amplification of the non-specific products. 

[0209] The first method utilizes a standard about 20-30 base long oligo-primer for 
the extmsion reaction. In the second approach, the primer is assembled by ligation of short 
(z.e. octamers) phosphorylated target-specific oUgonucleotides Pn firom a pre-synthesized 
oligo-library. FIGS. 14 and 15 show the assembly of only three sequence-specific 
oligonucleotides Pi, P2, P3, but their number can be substantially higher. The third method 
combines into one stq) a ligation of the target-specific oligonucleotides Pn and the adaptor- 
specific oligo Pa. 
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[0210] There are two reasons why the second and third selection protocols are 
preferable to the first protocol presented in FIGS. 13A-13E. First, they allow an increase in 
the stringency of the primer-extension step. Usually polymerases are more sensitive to the 
mismatches within the 3' region of the primer and can easily tolerate mis-pairing in the 
central and 5 '-portion. Thermostable ligases are also better at discriihinating mismatches 
located at the 3' end of the oligonucleotides during their Ugation. Without wishing to be 
boimd to one theory,, the inventors believe that primer assembly by ligation of short DNA 
molecules allows increase in the specificity and the selection power of the targeted 
ampUfication method due to the higher mismatch discrimination of multiple internal base 
positions within the priming site. 

[0211] Second, it offers a significant reduction of tum-around time and cost of the 
* VaUdng*' procedure. The library of all octamer oligonucleotides can be pre-synthesized, and 
the whole amplification-sequencing process can be completely automated. 

d. Method 4 

[0212] The fourth protocol is different in that it uses a don-immobilized DNA 
library and adds an additional selection step at the level of affinity capture of the Ugation- 
selected primer-extended PENTAmer molecules (FIGS. 16A through 16E). Otherwise, it is 
similar to the Method 1. FIGS. 16A through 16E show the fourth targeted amplification 
method involving five major steps. ^ 

[02i3] Step 1. Polymerase extension reaction with phosphorylated primer- 
selector P complementary to the left side of the restriction site R and Bst (heat sensitive) 
DNA polymerase (FIGS. 16A and 16B). 

[0214] Priming occurs internally within several overlapping PENTAmer 
molecules except PENTAmer X where priming occurs at the "restriction" end of the DNA 
fragment in the region immediately adjacent to the adaptor sequence A. 

[021S] Step 2. Heat inactivation of Bst DNA polymerase (FIG. 16C). 

[0216] Step 3. Ligation of the tagged oligonucleotide Pa to the 5' end of the 
extension product. Oligonucleotide Pa is complementary to the adaptor A and it is ligated 
only to the terminally extended molecule on the targeted PENTAmer X (FIG. 16D). 

[0217] Step 4. Magnetic bead capture of the targeted PENTAmer X (FIG. 16E). 

[0218] Step 5. PGR ampUfication using primers B and C (5' portion of the tagged 
oligo Pa) or B and a (FIG. 16 F). 

e. Removal of dU-containing DNA molecules 
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[0219] A skilled artisan recognizes that it would be useful to separate a desired 
molecule, or more than one, from an undesired molecule, or more than one. For example, in 
the present invention it is useful to separate a selected primer extended molecule from a 
library of nick translate molecules. A skilled artisan is aware of a variety of means to achieve 
this, but in the present invention it is preferred to polymerize nick translate molecules in the 
presence of dU nucleotides, but alternatively polymerize a desired primer extension molecule 
having no incorporation of dU. In a preferred embodiment, this occurs in the absence of dU 
nucleotides in a reaction mixture. The dU-containing molecules are then subjected to a dU 
glycosylase, such as AmpErase Uracil N-glycosylase (UNG) (Apphed Biosystems, Foster 
City, CA). When dUTP is substituted for dTTP in PGR amplification, exposure to UNG 
prevents the subsequent reamplification of dU-containing PGR products. UNG acts on 
single- or double-stranded dU-containing DNA by hydrolysis of uracil-glycosidic bonds 
(base excision) at dU-containing DNA sites, releasing uracil and creating an alkali-sensitive 
apyrtmidinic site in the DNA. Thus, uracil N-glycosylase can be used to cleave DNA at any 
position where a deoxyuridine triphosphate has been incorporated. 

D. Direct sequencing approach 

[0220J Suiprisingly, the inventors determined that the complex mixtures of nested 
molecules generated by PGR using one sequence-specific and one universal primer can be 
directly used for sequence analysis. Example 6 and FIG. 5 shows 55 different loci in the 
bacterial genome amplified using the PENTAmer library prepared by a partial digestion of 
the E, coli genomic DNA with the Sau3A I restriction enzyme (Example 5), universal primer 
B (Table VII) and 40 E, co/i-specific primers (Table VII). As expected, the electrophoretic 
profiles show a complex multi-band pattem with a maximum size of 1 kb (the PENTAmer 
size). The PGR products have been subjected to the cycle sequencing protocol using 
fluorescent dye-terminators and the same primers as used for PGR and then analyzed using 
the MEGABASE capillary DNA sequencer. The sequencing data have been analyzed by the 
Megabase capillary sequencing machine (Amersham; Piscataway, NT). 

[0221] The adaptor B sequence, which is located at different distances for 
different firagments, does not noticeably affect the quality of the sequencing data. FIG. 17 
shows the simplest case of only two overlapping firagments L (large) and S (short). It is 
expected that in the •Tjad" region where the sequence of the firagment L is overlapped witibi 
adaptor sequence B, the sequencing can be problotnatic. However, in tiie overlap area 
indicated by two vertical dashed hues, a total 18 DNA templates (LI-L13 from larger DNA 
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fragment A plus S1-S5 from shorter fragment B) produces a correct DNA sequracing ladder. 
Only 3 DNA templates (B6- B8) will produce an unreadable signal generated by adaptor 
sequences B. The expected noise-to-signal ratio in the area is only about 3/18 = 17%. 

[0222] In reality, the contribution of the adaptor DNA is very small because of 
two reasons: small size of the B region and the diffuse position of the "fuzzy" end with 
respect to the DNA priming site. If one assimies the same width of size distribution for both 
"fragments," it means there are the same number of molecules within a specific size sub- 
interval. For example, for the interval shown on FIG. 17 by two dashed vertical lines, the 
total niunber of molecules with a correct DNA sequence is equal to 13 "molecules" 
originated from the "fragment L plus 5 molecules originated from the "fragment S, with 
total number 18. The number of short "fragments" within the same interval is equal to 3 
giving the ratio of 0.17 for the contribution of the *T>ad" sequence B into the "good" signal. 
Practically, it can be estimated as a ratio between the adaptor B sequence length and the 
width of the PENTAmer size distribution. The latter is herein estimated as 150 bp and B is 
about 22 bp, giving the ratio of 0.15 very close to the hypothetical example shown on 
FIG. 17. 

[0223] The diffuse size distribution of the PENTAmer molecules is^inherent to the 
nick-translation process, and it is usefril. It is sufficiently narrow to allow one to control the 
average size of PENTAmers, and it is broad enough to minimize the effect of tiie B adaptor 
on the quaUty of DNA sequencing. It is clear that contribution of the B sequence can be 
fiulher minimized by shoi^ening of its size or even complete physical elimination of the 
terminal B sequence from the ends of amplified DNA templates. The latter can be achieved 
by a) by a limited trimming of DNA samples after PGR with 5' exonuclease {X exonuclease, 
or T7 gene 6 exonuclease); and/or b) by incorporation of the dU nucleotide or a 
ribonucleotide into the 3' portion of the B primer sequence and degradation of the B 
sequence using dU-glycosylase and/or alkaline hydrolysis, respectively. 

E. Applications of the FENTAmer Chromosome Walking Technology 

1. Filling gaps in genome sequencing projects 

[0224] It is obvious that the PENTAmer walking method described herein can be 
directly appUed to fill gaps left after the shotgun phase. Usually, there are about 200-300 gaps 
in a bacterial sequencing project following 6-7 time redundancy sequencing. The human 
genome project cxirrently has about 150,000 gaps. FIG. 18 illustrates the sequencmg of gaps 
in a genome, such as a bacterial genome, using primary PENTAmer Ubraries. 
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2. 1-2 time redundancy genomic sequencing 

[0225] The PENTAmer walking technology can be xised to sequence bacterial 
genomes with a minimal redundancy. For example, in a first phase the genome can be 
sequenced randomly with 1 time redundancy and then finished using PENTAmer library. 
Because the library preparation is cheap, the cost would mostly be determined by the cost of 
one sequence-specific oligonucleotide, which is about $ 2-3 for a 24-mer. That means that at 
about 600 bases obtained at each step, the oligo cost per base is going to be 0.5 cent plus 
additional O.S-1 cent per base for routine sequencing operation 

3. Sequencing unculturable microorganisms 

[0226] The fact that Ihe bacterial PENTAmer library can be diluted up to 1000 
times, amplified and used for recovery DNA sequence information suggests that it is suitable 
for making libraries fi-om a small amount of starting material, for example, unculturable 
bacteria or when there are other factors limiting the amount of DNA. 

4. Sequencing mixtures of microorganisms 

[0227] To the level the technology is applied to sequence more complex genomes, 
the PENTAmer libraries can be prepared firom a complex mixture of different 
microorganisms. In this case, the walking process will allow (with some limitations) 
sequence of individual genomes within a mix with other DNA. 

[0228] Thus, as described in the previous sections, the fundamental nature of the 
present invention is illustrated in FIG. 19, wherein positional genome walking occurs by 
targeted PENTAmer amplification. 

[0229] The next sections provide a brief overview of ntiaterials and techniques that 
a person of ordinary skill would deem important to the practice of the invention. These 
sectionis are followed by a more detailed description of the various embodiments of the 
invention. 

in. NUCLEIC ACIDS 

[0230] Genes are sequences of DNA in an organism's genome encoding 
information that is converted into various products making up a whole cell. They are 
expressed by the process of transcription, which involves copying the sequence of DNA into 
RNA. Most genes encode information to make proteins, but some encode RNAs involved in 
other processes. If a gene encodes a protein, its transcription product is called mKNA 
('•messenger" RNA). After transcription in the nucleus (where DNA is located), the mRNA 



45 



wo 02/103054 



PCT/USOl/44970 



must be transported into the cytoplasm for the process of translation, which converts Che code 
of the niRNA into a sequence of amino acids to fonn protein. In order to direct transport into 
the cytoplasm, the 3' ends of mRNA molecules are post-transcriptionally modified by 
addition of several adenylate residues to form the "polyA" tail. This characteristic 
modification distinguishes gene expression products destined to make protein firom other 
molecules in the cell, and thereby provides one means for detecting and monitoring the gene 
expression activities of a cell. 

[0231] The term "nucleic acid" will generally refer to at least one molecule or 
strand of DNA, RNA or a derivative or mimic thereof, comprising at least one nucleobase, 
such as, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g, 
adenine "A," guanine "G," thymine "T" and cytosine "C'O or RNA ie.g. A, G, uracil ^U" and 
C). The term "nucleic acid" encompass the terms "oligonucleotide" and "polynucleotide." 
The term "oligonucleotide" refers to at least one molecule of betwem about 3 and about 100 
nucleobases in length. The term '^polynucleotide" refers to at least one molecule of greater 
than about 100 nucleobases in length. These definitions generally refer to at least one single- 
stranded molecule, but in specific embodiments wiU also encompass at least one additional 
strand that is partially, substantially or fiilly complementary to the at least one single-stranded 
molecule. Thus, a nucleic acid may encompass at least one double-stranded molecule or at 
least one triple-stranded molecule that comprises one or more complementary strand(s) or 
"complement(s)" of a particular sequence comprising a**strand of the molecule. As used 
herein, a single stranded nucleic acid may be denoted by the prefix "ss", a double stranded 
nucleic acid by the prefix "ds", and a triple stranded nucleic acid by the prefix **ts." 

[0232] Nucleic acid(s) that are "complementary" or "complement(s)" are those 
that are capable of base-pairing according to the standard Watson-Crick, Hoogsteen or 
reverse Hoogsteen binding complementarity rules. As used herein, the term 
"complementary" or "complement(s)" also refers to nucleic acid(s) that are substantially . 
complementary, as may be assessed by the same nucleotide comparison set forth above. The 
tenn "substantially complementary" refers to a nucleic acid comprising at least one sequence 
of consecutive nucleobases, or semiconsecutive nucleobases if one or more nucleobase 
moieties are not present in the molecule, are capable of hybridizing to at least one nucleic 
acid strand or duplex even if less than all nucleobases do not base pair with a coimterpart 
nucleobase. In certain embodiments, a "substantially complementary" nucleic acid contains 
at least one sequence in which about 70%, about 71%, about 72%, about 73%, about 74%, 
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about 75%, about 76%, about 77%, about 77%, about 78%, about 79%, about 80%, about 
81%, about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, 
about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 
96%, about 97%, about 98%, about 99%, to about 100%, and any range therein, of the 
nucleobase sequence is capable of base-pairing with at least one single or double stranded 
nucleic acid molecule during hybridization. In certain embodiments, the term "substantially 
complementary" refers to at least one nucleic acid that may hybridize to at least one nucleic 
acid strand or duplex in stringent conditions. In certain embodiments, a "partly 
complementary" nucleic acid comprises at least one sequence that may hybridize in low 
stringency conditions to at least one single or double stranded nucleic acid, or contains at 
least one sequence in which less than about 70% of the nucleobase sequence is capable of 
base-pairing with at least one single or double stranded nucleic acid molecule during 
hybridization. 

[0233] As used herein, ^^hybridization", 'liybridizes" or "capable of hybridizing" 
^ is understood to mean the forming of a double or triple stranded molecule or a molecule with 
partial double or triple stranded nature. The term "hybridization", "hybridize(s)" or "capable 
of hybridizing" encompasses the terms "stringent condition(s)" or *lugh stringracy" and the 
terms "low stringency" or "low stringency condition(s)." 

[0234] As used herein "stringent condition(s)" or "high stringency" are those that 
allow hybridization between or within one or more nucleic acid strand(s) containing 
complementary sequence(s), but precludes hybridization of random sequences. Stringent 
conditions tolerate Uttle, if any, mismatch between a nucleic acid and a target strand. Such 
conditions are well known to those of ordinary skill in the art, and are preferred for 
applications requiring high selectivity. Non-limiting applications include isolating at least 
one nucleic acid, such as a gene or nucleic acid segment thereof, or detecting at least one 
specific mRNA transcript or nucleic acid segment thereof, and the like. 

[0235] Stringent conditions may comprise low salt and/or high temperature 
conditions, such as provided by about 0.02 M to about 0.15 M NaCl at temperatures of about 
50°C to about 70°C. It is understood that the temperature and ionic strength of a desired 
stringency are determined in part by the length of the particular nucleic acid(s), the length and 
nucleobase content of the target sequence(s), the charge composition of the nucleic acid(s), 
and to the presence of formamide, tetrametfaylammonium chloride or other solvent(s) in the 
hybridization mixture. It is generally appreciated that conditions may be rendered more 
stringent, such as, for example, Ae addition of increasing amounts of formamide. 
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[0236] It is also understood that these ranges, compositioiis and conditions for 
hybridization are mentioned by way of non-linuting example only, and that the desired 
stringency for a particular hybridization reaction is often determined empirically by 
comparison to one or more positive or negative controls. Depending on the application 
envisioned it is preferred to employ varying conditions of hybridization to achieve varying 
degrees of selectivity of the nucleic acid(s) towards target sequence(s). In a non-limiting 
example, identification or isolation of related target nucleic acid(s) that do not hybridize to a 
nucleic acid under stringent conditions may be achieved by hybridization at low temperature 
and/or high ionic strength. Such conditions are termed "low stringency" or "low stringency 
conditions", and non-limiting examples of low stringency include hybridization performed at 
about 0.15 M to about 0.9 M NaCl at a temperature range of about 20°C to about 50°C. Of 
course, it is within the skill of one in the art to further modify the low or high stringency 
conditions to suite a particular application. 

[02371 As used herein a "nucleobase" refers to a naturally occurring heterocyclic 
base, such as A, T, G, C or U (**naturally occurring nucleobase(s)"), foimd in at least one 
naturally occurring nucleic acid (ie. DNA and RNA), and their naturally or non-naturally 
occurring derivatives and mimics. Non-limiting examples of nucleobases include purines and 
pyrimidines, as well as derivatives and mimics thereof, which generally can form one or more 
hydrogen bonds ("anneal" or *1iybridize") with at least one naturally occurring nucleobase in 
manner that may substitute for naturally occurring nucleobase pairing (e.g. the hydrogen 
bonding between A and T, G and C, and A and U). 

[0238] As used herein, a "nucleotide" refers to a nucleoside further comprising a 
"backbone moiety" generally used for the covalent attachment of one or more nucleotides to 
another molecule or to each other to form one or more nucleic acids. The 'Tjackbone moiety" 
in naturally occurring nucleotides typically comprises a phosphorus moiety, which is 
covalently attached to a 5-carbon sugar. The attachment of the backbone moiety typically 
occurs at either the 3'- or 5'-position of the 5-carbon sugar. However, other types of 
attachments are known in the art, particularly when the nucleotide comprises daivatives or 
mimics of a naturally occurring 5-carbon sugar or phosphorus moiety, and non-limiting 
examples are described herein. 
IV. RESTMCTION ENZYMES 

[0239] liestriction-enzymes recognize specific short DNA sequences four to eight 
nucleotides long (see Table I), and cleave the DNA at a site within this sequence. In the 
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context of the present invention, restriction enzymes are used to cleave DNA molecules at 
sites corresponding to various restriction-enzyme recognition sites. The site may be 
specifically modified to allow for the initiation of the PENT reaction. In another 
embodiment, if the sequence of the recognition site is known primers can be designed 
comprising nucleotides corresponding to the recognition sequences. These primers, fiirther 
comprising PENT initiation sites may be ligated to the digested DNA. 

[0240] Restriction-enzymes recognize specific short DNA sequences four to eight 
nucleotides long (see Table I), and cleave the DNA at a site within this sequence. In the 
context of the present invention, restriction enzymes are used to cleave cDNA molecules at 
sites corresponding to various restriction-enzynae recognition sites. Frequently cutting 
enzymes, such as the four-base cutter enzymes, are preferred as this yields DNA firagments 
that are in the right size range for subsequent amplification reactions. Some of the preferred 
four-base cutters are NlaKI, DpnII, Sau3AI, Hsp92n, Mbol, Ndell, Bspl431, Tsp509 1, Hhal, 
HinPlI, HpaU, Mspl, Taq alphal, Maell or K2091 . 

[0241] As the sequence of the recognition site is known (see list below), primers 
can be designed comprising nucleotides corresponding to the recognition sequences. If the 
primer sets have in addition to the restriction recognition sequence, degenerate sequences 
corresponding to different combinations of nucleotide sequences, one can use the primer set 
to amplify DNA firagments that have been cleaved by the particular restriction enzyme. The 
list below exemplifies the currently known restriction enzymes that may be used in the 
invention. 

TABLE I: RESTRICTION ENZYMES 



En^me Name 


Recognition Sequence 


Aatn 


GACGTC 


Acc65I 


GGTACC 


Acc I 


GTMKAC 


Acil 


CCGC 


Acll 


AACGTT 


Afel 


AGCGCT 


Afin 


• CTTAAG 


Afim 


ACRYGT 


Age I 


ACCGGT 


Ahdl 


GACNNNNNGTC 


Alul 


AGCT 


Alwl 


GGATC 


AlwNI 


CAGNNNCTG 


Apal 


GGGCCC 


ApaLI 


GTGCAC 
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Enzyme Name 

Apol 
AscI 
Asel 
Aval 
Avan 
Ayr II 
Bad 
BamHI 
Ban I 
Bann 
Bbsl 
Bbvl 
BbvCI 
Beg I 
BciVI 
Bell 
Bfal 
Bgll 

Ml 
BlpI 

Bmrl 

Bpml 

BsaAI 

BsaBI 

BsaHI 
Bsal 

BsaJI 

BsaWI 

BseRI 
Bsfil 

BsiEI 
BsiHKAI 

BsiWI 
BslI 

BsmAI 

BsmBI 

BsmFI 

BsmI 

BsoBI 
Bspl286 I 

BspDI 

BspEI 

BspHI 

BspMI 

BsrBI 

BsrDI 

BsrFI 

BsrGI 



Recognitfon Sequence 

RAATTY 
GGCGCGCC 
ATTAAT 
CYCGRG 

GGWCC 
CCTAGG 
NACNNNNGTAPyCN 
GGATCC 
GGYRCC 
GRGCYC 
GAAGAC 

GCAGC 
CCTCAGC 
CGANNNNNNTGC 
GTATCC 
TGATCA 

CTAG 
GCCNNNNNGGC 
AGATCT 
GCTNAGC 
ACTGGG 
CTGGAG 
YACGTR 
GATNNNNATC 
GRCGYC 
GGTCTC 
CCNNGG 
WCCGGW 
GAGGAG 
GTGCAG 
CGRYCG 
GWGCWC 
CGTACG 
CCNNNNNNNGG 

GTCTC 
CGTCTC 

GGGAC 
GAATGC 
CYCGRG 
GDGCHC 
ATCGAT 
TCCGGA 
TCATGA 
ACCTGC 
CCGCTC 
GCAATG 
RCCGGY 
TGTACA 
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Enasyme Name 


Recognition Sequence 


BsrI 


ACTGG 


BssHn 


GCGCGC 


BssKI 


CCNGG 


Bst4C I 


ACNGT 


BssS I 


CACGAG 


BstAPI 


• GCANNNNNTGC 


BstB I 


TTCGAA 


BstEn 


GGTNACC 


BstFSI 


GGATGNN 


BstNI 


CCWGG 


BstUI 


CGCG 


BstXI 


CCANNNNNNTGG 


BstYI 


RGATCY 


BstZ17 1 


GTATAC 


Bsu36 1 


CCTNAGG 


Btgl 


CCPuPyGG 


Btrl 


CACGTG 


Cac8I 


GCNNGC 


Clal 


ATCGAT 


Ddel 


. CTNAG 


Dpnl 


GATC 


Dpnll 


GATC 


Dral 


TTTAAA 


Drain 


CACNNNGTG 


Drdl 


GACNNNNNNGTC 


Eae I 


YGGCCR 


Bag I 


CGGCCG 


Earl 


CTCTTC 


Ecil 


GGCGGA 


EcoNI 


CCTNNNNNAGG 


EcoO109I 


RGGNCCY 


EcoRI 


GAATTC 


EcoRV 


GATATC 


Paul 


CCCGCNNNN 


Fnu4HI 


GCNGC 


Fokl 


GGATG 


Fsel 


GGCCGGCC 


Fspl 


TGCGCA 


Haell 


RGCGCY 


Haein 


GGCC 


Hgal 


GACGC 


Hhal 


GCGC 


Hindi 


GTYRAC 


Hindm 


AAGCTT 


Hinfl 


GANTC 


HinPlI 


GCGC 


Hpal 


GTTAAC 


Hpan 


CCGG 
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Enzyme Name 


Recognition Sequence 


HphI 


GGTGA 


KasI 


GGCGCC 


BCpnl 


GGTACC 


Mbol 


GATC 


MboII 


GAAGA 


Mfel 


CAATTG 


IVDuI 


ACGCGT 


Mlyl 


GAGTONNNNN 


Mnll 


CCTC 


Msc I 


TGGCCA 


Mse I 


TTAA 


MslI 


CAYNNNNRTG 


MspAlI 


CMGCKG 


Msp I 


CCGG 


Mwo I 


GCNNNNNNNGC 


Nael 


GCCGGC 


Narl 


GGCGCC 


Neil 


CCSGG 


Ncol • 


CCATGG 


Ndel 


CATATiS 


NgoMI V 


GCCGGC 


Nhel 


GCTAGC 


Nlain 


CATG 


NlalV 


GGNNCC 


Not I 


GCGGCCGC 


Nrul 


TCGCGA 


Nsil 


ATGCAT 


Nspl 


RCATGY 


Pac I 


TTAATTAA 


PaeR7 1 


CTCGAG 

i& ^^^^^ JL^^ 


Pcil 


ACATGT 


PflFI 


GACNNNGTC 


PflMI 


CCANNNNNTGG 


Plel 


GAGTC 


Pmel 


GTTTAAAC 


Pmll 


CACGTG 


PpuMI 


RGGWCCY 


PshAI 


GACNNNNGTC 


Psil 


TTATAA 


PspGI 


CCWGG 


PspOM I 


GGGCCC 


PstI 


CTGCAG 


Pvul 


CGATCG 


Pvun 


CAGCTG 


Rsal 


GTAC 


RsrE 


CGGWCCG 


Sac I 


GAGCTC 


Sacn 


CCGCGG 
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Enzyme Name 


Recognition Sequence 


Sail 


GTCGAC 


Sao I 


GCTCTTC 


SauSAI 


GATC 


Sau96I 


GGNCC 


Sbfl 


CCTGCAGG 


Seal 


AGTACT 


ScrFI 


CCNGG 


SexAI 


ACCWGGT 


SfaNI 


GCATC 


Sfcl 


CTRYAG 


Sfil ' 


GGCCNNNNNGGCC 


Sfol 


GGCGCC 


SgrAl 


CRCCGGYG 


Smal 


CCCGGG 


Smll 


CTYRAG 


SnaBI 


TACGTA 


Spel 


ACTAGT 


SphI 


GCATGC 


Sspl 


AATATT 


StuI . . 


AGGCCT 


Sty I 


CCWWGG 


Swal 


ATTTAAAT 


TaqI 


TCGA 


Tfil 


GAWTC 


Tlil 


CTCGAG 


Tsel 


GCWGC 


Tsp45I 


GTSAC 


Tsp509I 


AATT 


TspRI 


CAGTG 


mill I 


GACNNNGTC 


Xbal 


TCTAGA 


Xcml 


CCANNNNNNNNNTGG 


Xhol 


CTCGAG 


Xmal 


CCCGGG 


XmnI 


GAANNNNTTC 



[0242] Furthermore, a skilled artisan recognizes that it may be useful in the 
present invention to selectively render particular restriction enzyme sites uncleavable, such as 
by methylation of the recognition site prior to exposure to certain methylation-sensitive 
.restriction enzymes. A skilled artisan recognizes that, for example, the dam and dan genes of 
E. coli encode gene products which are methylases that methylate a nucleic acid in their 
specific recognition sequence. Some enzymes will not cleave methylated sites, whereas other 
enzymes, such as Dpn I, have a requirement for methylation at the recognition site. 
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Examples of different classes of methylation requirements for specific enzymes are in Table 
n as follows: 

TABLE H: CpG METHYLATION AND ENZYME CLEAVAGE 



Cleavage Blocked at All 


Sites 












AatU 


GACGTC 


BsrYl 


RCCGGY 


HaeU 


RGCGCY 


Nrul 


TCGCGA 


Acil 


CCGC 


BSSHU 


GCGCGC 


Hgal 


GACGC 


PmH 


CACGTG 


Agel 


ACCGGT 


BSTBl 


TTCGAA 


Hhal 


GCGC 


Pjpl406I 


AACGTT 


AhaU 


GRCGYC 


BS7UI 


CGCG 


HihPl I 


GCGC 


Pvul 


OGATCG 


Ascl 


GGCGCGCC 


CfrlOl 


RCCGGY 


HpaU 


CCGG 


RsrIL 


CGGWCCG 


Aval 


CYCGRG 


CM 


ATCGAT 


Kasl 


GGCGCC 


SacE 


CCGCGO 


BsaAI 


YACGTR 


Eagl 


CGGCCG 


Mlul 


ACGCGT 


SaR 


GTCGAC 


Bsdm 


GRCGYC 


£co47m 


AGCGCT 


Nael 


GCCGGC 


Smal 


CCCGGG 


BsiEl 


CGRYCG 


Esp3l 


CGTCTC{l/5) 


Narl 


GGCGCC 


SnaBl 


TACGTA 


Bsim 


CGTACG 


Fsel 


GGCCGGCC 


NgoMTV 


GCCGGC 


Tail 


ACGT 


BspDl 


ATCGAT 


Fspl 


TGCGCA 


Not I 


GCGGCCGC 


Xliol 


CTCGAG 


Cleavage Blocked Only at Sites with Overlapping CG 










Accl 


GIMKAC 


BanV 


GGYRCC 


BspllOl 


GGGCCC 


Nhel 


GCTAGC 


Acc65l 


GGTACC 


BsaB ? 


GATN4ATC 


BstUOll 


GTATAC 


Rsaf 


GTAC 


AIw26l 


GTCTC 


Bsgl 


GTGCAG 


Drdl' 


GACN6GTC 


FshAl^ 


OACNNIWGTC 


Apal 


GGGCCC 


Bsa 


CCN7GG 


Eael 


YGGCCR 


iSauSAI 


GATC 


ApalA 


GTGCAC 


BsmAl 


GTCTC 




GAGCTC 


Sau96l 


GGNCC 


AvaU 


GGWCC 


BsoFl^ 


GCNGC 


Hpaf 


GTTAAC 







Cleavage Not Blocked at Sites with Overlappuig CG 



BamHL 


GGATCC 


BsrBl^ 


GAGCGG 


EcoRW 


GATATC 


Pmel 


GTTTAAAC 


BanJL 


GRGCYC 


BstBU 


GGTNACC 


FoJcl 


GGATG 


Sad 


GAGCTC 


Bbsl 


GAAGAC 


BstYl 


RGTACY 


Haem 


GGCC 


stdm 


GCATC 


BsaJl 


CCNNGG 


CspSl 


GTAC 


HglAl 


GWGCWC 


Sphl 


GCATGC 


BsaWl 


WCCGGW 


£amll05I 


GACN5GTC 


Hphl 


GGTGA 


Taql 


TCGA 


Bsml 


GATTGC 


Earl 


CCTCTTC 


Kpnl 


GGTACC 


Tftl 


GAWTC 


BspUm 


GDGCHC 


EcoOl09\ 


RGGNCCY 


Mspl 


CCGG 


Tthmi 


GACN3GTC 


BspBl^ 


TCCGGA 


EcoRl 


GATTC 


PaeRll 


CTCGAG 


Xmal 


CCCGGG 



Bspya ACCTGC 

Examples of restriction enzyme sites sensitive to Dam and Dcm me&ylation in 
particular are in Table m as follows: 



TABLE in-DAM AM) DCM METHYLATION 



Dam Methylation: G^ATC 
Blocked by Overlapping Dam: 

Alwl GGATC 
BcR TGATCA 
BsaB I GATCNNNATC 
BspD I ATCGATC 
EspEl TCCGGATC 
BspYLl TCATGATC 
C/fll ATCGATC 
Dpn n GATC 



Dcm Methylation: C^ikmyGG 
Blocked by Overlapping Dcm: 
Acc65l GGTACC(A/T)GG 
Alwm CAGNNCCTGG 
Apal GGGCCC(A/T)GG 
AvdO. GG(A/T)CC(A/T)GG 
Ball TGGCCAGg 
Bpml CCTGGAG 
BsH C C(A/T)GGN NNNGG 
Bspmi GGGCCC(A/T)GG 
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Hphl 


GGTGATC 


BssKI 


CC(A/T)GG 


Mbol 


. GATC 


Eael 


(C/T)GGCCAGG 


MboE 


GAAGATC 


EcoO109I 


(A/G)GGNCCTGG 


Nrul 


TCGCGATC 


EcoRR 


CC(A/T)GG 


Taql 


TCGATC 


MscL 


TGGCCAGG 


Xbal 


TCTAGATC 


P/Ml 


CCAGCaSINNTGG 


Not Blocked by Overlapping Dam: 


PpiMl 


(A/G)GG(A/r)CCTGG 


BamHl 


GGATCC 


Sau96I 


GGNCC(A/r)GG 


BglR 


AGATCT 


ScrPl 


CC(A/T)GG 


BspMR 


TCCGGATC 


SexAl 


ACC(A/T)GGT 


BstYl 


(A/G)GATC(C/T) 


Sfil 


GGCC(A/T)GGNNGGCC 


Pvul 


CGATCG 


Stid 


AGGCCTGG 


Sau3Al 


GATC 










Not Blocked by Overlapping Dcm 






BanU 


G(AyG)GCCC(A/r)GG 






Bgli 


GCC(An')GGNNGGC 






BsaJl 


CC(AA)GGG 






BspUm 


G(A/G/T)GCCC(A/T)GG 






Bsm 


CC(AyT)GG 






BstBU 


GGTNACC(A/T)GG 






Ehel 


GGCGCC(A/T)GG 






Haem 


GGCC(A/T)GG 






Kpnl 


GGTACC(A/T)(GG 






Narl 


GGCGCC(AAOGG 



Sfil GGCCNNNNNGGCC(A/T)GG 



[0243] Other examples of methylation-sensitive enzymes, which may not be hsted 
here, are obtainable by a skilled artisan. 

V, OTHER ENZYMES 

[0244] Other enzymes that may be used in conjunction with the invention include 
nucleic acid modifying enzymes Usted in the following tables. 

TABLE IV: POLYMERASES AND INVERSE TRANSCR]^ 

Thermostable BNA Polymeraises : 

OmniBase™ Sequencing Enzyme 

Pfu DNA Polymerase 

Taq DNA Polymerase 

Taq DNA Polymerase, Sequencing Grade 

TaqBead™ Hot Start Polymerase 

AmpliTaq Gold 

Tfl DNA Polymerase 

Tli DNA Polymerase 

Tth DNA Polymerase 



55 



wo 02/103054 



PCT/USOl/44970 



DNA Polymerases: 

DNA Polymerase I, Klenow Fragment, Exonuclease Mmus 
DNA Polymerase I 

DNA Polymerase I Large (Klenow) Fragment 
Terminal Deoxynucleotidyl Transferase 
T4 DNA Polymerase 

Reverse Transcriptases: 

AMV Reverse Transcriptase 
M-MLV Reverse Transcriptase 

TABLE V: DNA/RNA MODIFYING ENZYMES 
Ligases: 
T4 DNA Ligase 
Kinases 

T4 Polynucleotide Kinase 
VI. DNA POLYMERASES 

[0245] In the context of the present invention it is generally contemplated that the 
DNA polymerase will retain 5'-3 ■ exonuclease activity. Nevertheless, it is envisioned that the 
methods of the invention could be* carried out with one or more enzymes where multiple 
enzymes combine to carry out the function of a single DNA polymerase molecule retaining 
5'-3' exonuclease activity. Effective polymerases which retain 5'-3' exonuclease activity 
include, for example, E. coli DNA polymerase I, Taq DNA polymerase, S. pneumoniae DNA 
polymerase I, Tfl DNA polymerase, D. radiodurans DNA polymerase I, Tth DNA 
polymerase, Tth XL DNA polymerase, M.tuberculosis DNA polymerase I, M 
thermoaiitotrophicum DNA polymerase I, Herpes simplex- 1 DNA polymerase, E. coli DNA 
polymerase I Klenow fragment. Vent DNA polymerase, thennosequenase and wild-type or 
modified T7 DNA polymerases. In preferred embodiments, the effective polymerase is E. 
coli DNA polymerase I, M. tuberculosis DNA polymerase I or Taq DNA polymerase. 

[0246] Where the break in the substantially double stranded nucleic acid template 
is a gap of at least a base or nucleotide in length that comprises, or is reacted to comprise, a 3' 
hydroxyl group, the range of effective polymerases that maybe used is even broader. In such 
aspects, the effective polymerase may be, for example, E. coli DNA polymerase I, Taq DNA 
polymerase, S, pneumoniae DNA polymerase I, Tfl DNA polymerase, D, radiodurans DNA 
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polymerase I, Tth DNA polymerase, Tth XL DNA polymerase, M tuberculosis DNA 
polymerase I, M thermoautotrophicum DNA polymerase I, Herpes simplex- 1 DNA 
polymerase, E. coli DNA polymerase I Klenow fragment, T4 DNA polymerase, vent DNA 
polymerase, thermosequenase or a wild-type or modified T7 DNA polymerase. In preferred 
aspects, the effective polymerase is E. coli DNA polymerase I, M tuberculosis DNA 
polymerase I, Taq DNA polymerase or T4 DNA polymerase. 

m HYBRIDIZATION 

[0247] PENTAmer synthesis requires the use of primers which hybridize to 
specific sequences. Fiuiher, PENT reaction products may be usefiil as probes in 
hybridization analysis. The use of a probe or primer of between about 13 and 100 
nucleotides, preferably between about 17 and 100 nucleotides in length, or in some aspects of 
the invention up to about 1-2 Kb or more in length, allows the formation of a duplex 
molecule that is both stable and selective. Molecules having complementary sequences over 
contiguous stretches greater than about 20 bases in length are generally preferred, to increase 
stability and/or selectivity of the hybrid molecules obtained. One will generally prefer to 
design nucleic acid molecules for hybridization having one or more complementary 
sequences of 20 to 30 nucleotides, or even longer where desired. Such fragments may be 
readily prepared, for example, by directly synthesizing the fragment by chemical means or by 
introducing selected sequences into recombinant vectors for recombinant production. 

[0248] Depending on the application envisioned, one would desire to employ 
varjdng conditions of hybridization to achieve varying degrees of selectivity of the probe or 
primers for the target sequence. For applications requiring high selectivity, one will typically 
desire to employ relatively high stringency conditions to form the hybrids. For example, 
relatively low salt and/or high temperature conditions, such as provided by about 0.02 M to 
about 0.10 M NaCl at temperatures of about SO^'C to about 70°C. Such high stringency 
conditions tolerate little, if any, mismatch between the probe or primers and the template or 
target strand and would be particularly suitable for isolating specific genes or for detecting 
specific mRNA transcripts. It is generally appreciated that conditions can be rendered more 
stringent by the addition of increasing amounts of formamide. 

[0249] Conditions may be rendered less stringent by increasing salt concentration 
and/or decreasing temperature. For example, a medium stringency condition could be 
provided by about 0.1 to 0.25 M NaCl at temperatures of about 37°C to about SS^'C, while a 
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low stringency condition could be provided by about 0.15 M to about 0.9 M salt, at 
temperatures ranging from about 20°C to about 55**C. Hybridization conditions can be 
readily manipulated depending on the desired results. 

[0250] In other embodiments, hybridization may be achieved xmder conditions of, 
for example, 50 mM Tris-HCl (pH 8.3), 75 mM KCl, 3 mM MgCk, 1.0 mM dithiothreitol, at 
temperatures between approximately 20°C to about 37°C. Other hybridization conditions 
utilized could include approximately 10 mM Tris-HCl (pH 8 J), 50 mM KCl, 1.5 mM MgCfe, 
at temperatures ranging from approximately 40°C to about 72®C. 

Vin. AMPLIFICATION OF NUCLEIC ACIDS 

[0251] Nucleic acids usefril as templates for amplification may be isolated from 
cells, tissues or other samples according to standard methodologies (Sambrook et al^ 1989). 
In certain embodiments, analysis is performed on whole cell or tissue homogenates or 
biological fluid samples without substantial purification of the template nucleic acid. The 
nucleic acid may be genomic DNA or fractionated or whole cell RNA. Where RNA is used, 
it may be desired to first convert flie RNA to a complementary DNA. 

[0252] The term "primer," as used herein, is meant to encompass any nucleic acid 
that is capable of priming the synthesis of a nascent nucleic acid in a template-dependent 
process. Typically, primers are oligonucleotides from ten to twenty and/or thirty base pairs in 
length, but longer sequences can be employed. Primers may be provided in double-stranded 
and/or single-stranded form, although the single-stranded form is preferred. 

[0253] Pairs of primers designed to selectively hybridize to nucleic acids are 
contacted with the template nucleic acid under conditions that permit selective hybridization. 
Depending upon the desired apphcation, high stringency hybridization conditions may be 
selected that will only allow hybridization to sequences that are completely complementary to 
the primers. In other embodiments, hybridization may occur under reduced stringency to 
allow for amplification of nucleic acids contain one or more mismatches with the primer 
sequences. Once hybridized, the template-primer complex is contacted with one or more 
enzymes that facilitate template-dependent nucleic acid synthesis. Multiple rounds of 
amplification, also referred to as "cycles," are conducted until a sufficient amount of 
amplification product is produced. 

[0254] The amplification product may be detected or quantified. In certain 
applications, the detection may be performed by visual means. Alternatively, ttie detection 
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may involve indirect identification of the product via chemilmninescence, radioactive 
scintigraphy of incorporated radiolabel or fluorescent label or even via a system using 
electrical and/or thermal impulse signals (Affymax technology). 

[0255] A number of template dependent processes are available to amplify the 
oligonucleotide sequences present in a given template sample. One of the best known 
ampUfication methods is the polymerase chain reaction (referred to as PGR™) which is 
described in detail in U.S. Patent Nos. 4,683,195, 4,683,202 and 4,800,159, and in Innis et 
al^ 1990, each of which is incorporated herein by reference in their entirety. Briefly, two 
synthetic oligonucleotide primers, which are complementary to two regions of the template 
DNA (one for each strand) to be amplified, are added to the template DNA (that need not be 
pure), in the presence of excess deoxynucleotides (dNTPs) and a thermostable polymerase, 
such as, for example, Taq {Thermus aquaticus) DNA polymerase. In a series (typically BO- 
SS) of temperature cycles, the target DNA is repeatedly denatured (aroimd 90°C), annealed to 
the primers (typically at and a daughter strand extended firom the primers (72°C). 

As the daughter strands are created they act as templates in subsequent cycles. Thus the 
template region between the two primers is amplified exponentially, rather than linearly. 

[0256] A reverse transcriptase PGR™ amplification procedure may be performed 
to quantify the amoimt of nxRNA ampUfied. Methods of reverse transcribing RNA into 
cDNA are well known and described in Sambrook et aL, 1989. Alternative methods for 
reverse transcription utiUze thermostable DNA polymerases. These methods are described in 
WO 90/07641. Polymerase chain reaction methodologies are well known in the art. 
Representative methods of RT-PCR are described in U.S. Patent No. 5,882,864. 

A. LCR 

[0257] Another method for amplification is the ligase cham reaction ('"LCR"), 
disclosed in European Patent Application No. 320,308, incorporated herein by reference. In 
LCR, two complementary probe pairs are prepared, and in the presence of the target 
sequence, each pair will bind to opposite compiementaiy strands of the target such that they 
abut. In the presence of a ligase, the two probe pairs wiU link to form a single unit. By 
temperature cycling, as in PGR™, bound ligated units dissociate from the target and then 
serve as "target sequences" for ligation of excess probe pairs. U.S. Patent 4,883,750, 
incorporated herein by reference, describes a method similar to LCR for binding probe pairs 
to a target sequence. 

B. Qbeta RepUcase 
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[0258] Qbeta Replicase, described in PCT Patent Application No. 

PCT/US87/00880, also may be used as still another amplification method in the present 
invention. In this method, a replicative sequence of RNA which has a region complementary 
to that of a target is added to a sample in the presence of an RNA polymerase. The 
polymerase will copy the replicative sequence which can then be detected. 

C. Isothermal Amplification 

[0259] An isothermal amplification method, in which restriction endonucleases 
and ligases are used to achieve the amplification of target molecules that contain nucleotide 
5'-[a-thio]-triphosphates in one strand of a restriction site also may be useful in the 
amplification of nucleic acids in the present invention. Such an amplification method is 
described by Walker et al 1992, incorporated herein by reference. 

D. Strand Displacement Amplification 

[0260] Strand Displacement AmpUfication (SDA) is another method of carrying 
out isotheraial amplification of nucleic acids which involves multiple rounds of strand 
displacement and synthesis, nick translation. A similar method, called Repair Chain 
Reaction (RCR), involves annealing several probes throughout a region targeted for 
amplification, followed by a repair reaction in which only two of the four bases are present. 
The other two bases can be added as biotinylated derivatives for easy detection. A similar 
approach is used in SDA. 

E. Cyclic Probe Reaction 

[0261] Target specific sequences can also be detected using a cyclic probe 
reaction (CPR). In CPR, a probe having 3' and 5' sequences of non-specific DNA and a 
middle sequence of specific RNA is hybridized to DNA which is present in a sample. Upon 
hybridization, the reaction is treated with RNase H, and the products of the probe identified 
as distinctive products which are released after digestion. The original template is annealed 
to another cychng probe and the reaction is repeated. 

F. Transcription-Based Amplification 

[0262] Other nucleic acid amplification procedures include transcription-based 
amplification systems (TAS), including nucleic acid sequence based amplification (NASBA) 
and 3SR, Kwoh et al, 1989; PCT Patent Application WO 88/10315 et al, 1989, each 
uicorporated herein by reference). 
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[0263] In NASBA, the nucleic acids can be prepared for amplification by standard 
phenol/chlorofonn extraction, heat denaturation of a clinical sample, treatment with lysis 
buffer and minispin columns for isolation of DNA and RNA or guanidinium chloride 
extraction of RNA. These amplification techniques involve annealing a primer which has 
target specific sequences. Following polymerization, DNA/RNA hybrids are digested with 
RNase H while double stranded DNA molecules are heat denatured again. In either case the 
single stranded DNA is made fiiUy double stranded by addition of second target specific 
primer, followed by polymerization. The double-stranded DNA molecules are then multiply 
transcribed by a polymerase such as T7 or SP6. In an isothermal cyclic reaction, the RNA's 
are reverse transcribed into double stranded DNA, and transcribed once against with a 
polymerase such as T7 or SP6, The resulting products, whether truncated or complete, 
indicate target specific sequences. 

Other Amplification Methods 

[0264] Other amplification methods, as described in British Patent Application 
No. GB 2,202,328, and m PCT Patent Application No. PCT/US89/01025, each incorporated 
herein by reference, may be used in accordance with the present invention. In the former 
application, ''modified" primers are used in a PGR™ like, template and enzyme dependent 
synthesis. The primers may be modified by labeling with a capture moiety (e.g-., biotin) 
and/or a detector moiety (e.g., enzyme). In the latter appUcation, an excess of labeled probes 
are added to a sample. In the presence of the target sequence, the probe binds and is cleaved 
catalytically. After cleavage, the target sequence is released intact to be bound by excess 
probe. Cleavage of the labeled probe signals the presence of the target sequence. 

[02651 Miller et al, PGT Patent Application WO 89/06700 (mcorporated herein 
by reference) disclose a nucleic acid sequence amplification scheme based on the 
hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA**) 
followed by transcription of many RNA copies of the sequence. This scheme is not cyclic, 
Le.y new templates are not produced from the resultant RNA transcripts. 

[0266] Other suitable amplification methods include "'race" and "one-sided 
PGR™" (Frohman, 1990; Ohara et al, 1989, each herein incorporated by reference). 
Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic acid 
having the sequence of the resulting "di-oUgonucleotide", thereby amplifying the 
di-oligonucleotide, also may be used in the amplification step of the present invention, Wu et 
al, 1989, incorporated herein by reference). 
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IX, DETECTION OF NUCLEIC ACIDS 

[0267] Following any amplification, it may be desirable to separate the 
amplification product firom the template and/or the excess primer. In one embodiment, 
amplification products are separated by agarose, agarose-acrylamide or polyaciylamide gel 
electrophoresis using standard methods (Sambrook et al^ 1989). Separated amplification 
products may be cut out and eluted from the gel for fiirther manipulation. Using low melting 
point agarose gels, the separated band may be removed by heating the gel, followed by 
extraction of the nucleic acid. 

[0268] Separation of nucleic acids may also be effected by chromatognqphic 
techniques known in art. There are many kinds of chromatography which may be used in the 
practice of the present invention, including adsorption, partition, ion-exchange, 
hydroxylapatite, molecular sieve, revise-phase, colunm, paper, thin-layer, and gas 
chromatography as well as HPLC. 

[0269] In certain embodiments, the amplification products are visualized. A 
typical visualization method involves staining of a gel with ethidium bromide and 
visualization of bands under UV light. Alternatively, if the amplification products are 
integrally labeled with radio- or fluorometrically-labeled nucleotides, the separated 
amplification products can be exposed to x-ray film or. visualized under the appropriate 
excitatory spectra. 

[0270] In one embodiment, following separation of amplification products, a 
labeled nucleic acid probe is brought into contact with the amplified marker sequence. The 
probe preferably is conjugated to a chromophore but may be radiolabeled. In another 
embodiment, the probe is conjugated to a blading partner, such as an antibody or biotin, or 
another binding partner carrying a detectable moiety. 

[0271] In particular embodiments, detection is by Southem blotting and 
hybridization with a labeled probe. The techniques involved in Southem blotting are well 
known to those of skill in the art. See Sambrook et al, 1989. One example of the foregoing 
is described in U.S. Patent No. 5,279,721, incorporated by reference herein, wliich discloses 
an apparatus and method for the automated electrophoresis and transfer of nucleic acids. The 
apparatus permits electrophoresis and blotting without extemal manipulation of the gel and is 
ideally suited to carrying out methods accordiag to the present invention. 
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[0272] Other methods of nucleic acid detection that may be used in the practice of 
the instant invention are disclosed in U.S. Patent Nos. 5,840,873, 5,843,640, 5,843,651, 
5,846,708, 5,846,717, 5,846,726, 5,846,729, 5,849,487, 5,853,990, 5,853,992, 5,853,993, 
5,856,092, 5,861,244, 5,863,732, 5,863,753, 5,866,331, 5,905,024, 5,910,407, 5,912,124, 
5,912,145, 5,919,630, 5,925,517, 5,928,862, 5,928,869, 5,929,227, 5,932,413 and 5,935,791, 
each of which is incorporated herein by reference. 

X. SEPARATION AND QUANTITATION METHODS 

[0273] Following amplification, it may be desirable to separate the amplification 
products of several different lengths firom each other and firom the template and the excess 
primer for the purpose analysis or more specifically for detennining whether specific 
amplification has occurred. 

A* Gel electrophoresis 

[0274] In one embodiment, amphfication products are separated by agarose, 
agarose-acrylamide or polyacrylamide gel electrophoresis using standard methods (Sambrook 
etaLA989). 

[0275] Separation by electrophoresis is based upon the differential migration 
through a gel according to the size and ionic charge of the molecules in an electrical field. 
High resolution techniques normally use a gel support for the fluid phase. Examples of gels 
used are starch, acrylamide, agarose or mixtures of aciylamide and agarose. Frictional 
resistance produced by the siQ)port causes size, rather than charge alone, to become the major 
determmant of separation. Smalls molecules with a more negative charge will travel faster 
and fiirther through the gel toward the anode of an electrophoretic cell when high voltage is 
applied. Similar molecules will group on the gel. They may be visualized by staining and 
quantitated, in relative terms, using densitometers which continuously monitor the 
photometric density of the resulting stain. The electrolyte may be continuous (a single buffer) 
or discontinuous, where a sample is stacked by means of a buffer discontinuity, before it 
enters the running gel/ running buffer. The gel may be a single concentration or gradient in 
which pore size decreases with migration distance. In SDS gel electrophoresis of proteins or 
electrophoresis of polynucleotides, mobility depends primarily on size and is used to 
determined molecular weight. In pulse field electrophoresis, two fields are appUed alternately 
at right angles to each other to minimize diffusion mediated spread of large linear polymers. 
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[0276] Agarose gel electrophoresis facilitates the separation of DNA or RNA 
based upon size in a matrix composed of a highly purified form of agar. Nucleic acids tend 
to become oriented in an end on position in the presence of an electric field. Migration 
through the gel matrices occurs at a rate inversely proportional to the logio of the number of 
base pairs (Sambrook etal , 1989). 

[0277] Polyacrylamide gel electrophoresis (PAGE) is an analytical and separative 
technique in which molecules, particularly proteins, are separated by their different 
electrophoretic mobilities in a hydrated gel. The gel suppresses convective mixing of the fluid 
phase through which the electrophoresis takes place and contributes molecular sieving. 
Commonly carried out in the presence of the anionic detergent sodium dodecylsulphate 
(SDS). SDS denatures proteins so that noncovalently associating sub unit polypeptides 
migrate independently and by binding to the proteins confers a net negative charge roughly 
proportional to the chain weight. 

B. Chromatographic Techniques 

[0278] Alternatively, chromatographic techniques may be employed to effect 
separation. There are many kinds of chromatography which may be used in the present 
invention: adsorption, partition, ion-exchange and molecular sieve, and many specialized 
techniques for using them including column, paper, thin-layer and gas chromatography 
(Freifelder, 1982). In yet another altemative, labeled cDNA products, such as biotin or 
antigen can be captured with beads bearing avidin or antibody, respectively. 

C. Microfluidic Techniques 

[0279] Microfluidic techniques include separation on a platform such as 
microcapillaries, designed by ACLARA Biosciences Inc., or the LabChip™ "liquid 
integrated circuits" made by Caliper Technologies Inc. These microfluidic platforms require 
only nanoliter volumes of sample, in contrast to the microliter volumes required by other 
separation technologies. Miniaturizing some of the processes involved in genetic analysis 
has been achieved using microfluidic devices. For example, published PCT Application No. 
WO 94/05414, to Northrup and White, incorporated herein by reference, reports an integrated 
micro-PCR™ apparatus for collection and amplification of nucleic acids firom a specimen. 
U.S. PatOTt Nos. 5,304,487 and 5,296,375, discuss devices for collection and analysis of cell 
containing samples and are incorporated herein by reference. U.S. Patent No. 5,856,174 
describes an apparatus which combines the various processing and analytical operations 
involved in nucleic acid analysis and is incorporated herein by reference. 
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D. Capillary Electrophoresis 

[0280] In some embodiments, it may be desirable to provide an additional, or 
alternative means for analyzing the amplified genes. In these embodiment, micro capillary 
arrays are contemplated to be used for the analysis. 

[0281] Microcapillary array electrophoresis generally involves the use of a thin 
capillary or channel that may or may not be filled with a particular separation medium. 
Electrophoresis of a sample through the capillary provides a size based separation profile for 
the sample. The use of microcapillary electrophoresis in size separation of nucleic acids has 
been reported in, for example, WooUey and Mathies, 1994. Microcapillary array 
electrophoresis generally provides a rapid method for size-based sequencing, PGR™ product 
analysis and restriction firagment sizing. The higji surface to volume ratio of these capillaries 
allows for the application of higher electric fields across tihe capillary without substantial 
thermal variation across the capillary, consequently allowing for more rapid separations. 
Furthermore, when combined with confocal imaging methods, these methods provide, 
sensitivity in the range of attomoles, which is comparable to the sensitivity of radioactive 
sequencing methods. Microfabrication of microfliddic devices including microcapillary 
electrophoretic devices has been discussed in detail in, for example, Jacobsen et al, 1994; 
Effenhauser et aL, 1994; Harrison et al, 1993; Effenhauser et al, 1993; Manz et al, 1992; 
and U.S. Patent No. 5,904,824, here incorporated by reference. Typically, these methods 
coniprise photolithographic etching of micron scale channels on a silica, silicon or other 
crystalline substrate or chip, and can be readily adapted for use in the present invention. In 
some embodiments, the capillary arrays may be fabricated firom the same polymeric materials 
described for the fabrication of the body of the device, using the injection molding techniques 
described herein. 

[0282] Tsuda et aL, 1990, describes rectangular capillaries, an alternative to the 
cylindrical capillary glass tubes. Some advantages of these systems are their efficient heat 
dissipation due to the large height-to-width ratio and, hence, their high surface-to-volume 
ratio and their high detection sensitivity for optical on-colunan detection modes. These flat 
separation channels have the ability to p^orm two-dimensional separations, with one force 
being ^plied across the separation channel, and with the sample zones detected by the use of 
a multi-channel array detector. 

[0283] In many capillary electrophoresis methods, the capillaries, e,g., fiised silica 
capillaries or channels etched, machined or molded into planar substrates, are filled with an 
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appropriate separation/sieving matrix. Typically, a variety of sieving matrices are known in 
the art may be used in the microcapillary arrays. Examples of such matrices include, e.g., 
hydroxyethyl cellulose, polyacrylamide, agarose and the like. Generally, the specific gel 
matrix, running buffers and running conditions are selected to maximize the separation 
characteristics of the particular ^plication, eg., the size of the nucleic acid jfragments, the 
required resolution, and the presence of native or undenatured nucleic acid molecules. For 
example, running buffers may include denaturants, chaotropic agents such as urea or the like, 
to denature nucleic acids in the sample. 
E. Mass Spectroscopy 

[0284] Mass spectrometry provides a means of "weighing'* individual molecules 
by ionizing the molecules in vacuo and making fliem "fly" by volatilization. Under the 
influence of combinations of electric and magnetic fields, the ions follow trajectories 
depending on theu: individual mass (m) and charge (z). For low molecular weight molecules, 
mass spectrometry has been part of the routine physical-organic repertoire for analysis and 
characterization of organic molecules by the determination of the mass of the parent 
molecular ion. In addition, by arranging collisions of this parent molecular ion with other 
particles (e.g., argon atoms), the molecular ion is fragmented forming secondary ions by the 
so-called colUsion induced dissociation (CID). The fragmentation pattern/pathway very often 
allows the derivation of detailed structural information. Other applications of mass 
spectrometric methods in the known in the art can be foxmd siunmarized in Methods in 
Enzymology, Vol. 193: "Mass Spectrometry" ( McCloskey, editor), 1990, Academic Press, 
New York. 

[0285] Due to the apparent analytical advantages of mass spectrometry in 
providing high detection sensitivity, accuracy of mass measurements, detailed structural 
information by CID in conjunction with an MS/MS configuration and speed, as well as on- 
line data transfer to a computer, there has been considerable interest in the use of mass 
spectrometry for the structural analysis of nucleic acids. Reviews summarizing this field 
include Schram, 1990 and Grain, 1990 here incorporated by reference. The biggest hurdle to 
applying mass spectrometry to nucleic acids is the difficulty of volatilizing these very polar 
biopolymers. Therefore, "sequencing" had been limited to low molecular weight synthetic 
oligonucleotides by determining the mass of the parent molecular ion and through this, 
confirming the already known sequence, or alternatively, confirming the known sequence 
through the generation of secondary ions (firagment ions) via CID in an MS/MS configuration 
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utili2dng, in particular, for the ionization and volatilization, the method of fast atomic 
bombardment (FAB mass spectrometry) or plasma desorption (PD mass spectrometry). As 
an example, the application of FAB to the analysis of protected dimeric blocks for chemical 
synthesis of oUgodeoxynucleotides has been described (Koster et aL 1987). 

[0286] Two ionization/desorption techniques are electrospray/ionspray (ES) and 
matrix-assisted laser desorption/ionization (MALDI). ES mass spectrometry was introduced 
by Fenn, 1984; PCX Application No. WO 90/14148 and its appUcations are simomarized in 
review articles, for example. Smith 1990 and Ardrey, 1992. As a mass analyzer, a 
quadrupole is most frequently used. The determination of molecular weights in femtomole 
amounts of sample is very accurate due to the presence of multiple ion peaks which all could 
be used for the mass calculation. 

[0287] MALDI mass spectrometry, in contrast, can be particularly attractive when 
a time-of-flight (TOF) configuration is used as a mass analyzer. The MAUDI-TOF mass 
spectrometry has been introduced by Hillenkamp 1990. Since, in most cases, no multiple 
molecular ion peaks are produced with this technique, the mass spectra, in principle, look 
simpler compared to ES mass spectrometry. DNA molecules up to a molecular weight of 
410,000 daltons could be desorbed and volatilized (Williams, 1989). More recently, this the 
use of infra red lasers (IR) in this technique (as opposed to UV-lasers) has been shown to 
provide mass spectra of larger nucleic acids such as, synthetic DNA, restriction enzyme 
fragments of plasmid DNA, and RNA transcripts up to a size of 2180 nucleotides 
(Berkenkamp, 1998). Berkenkamp also describe how DNA and RNA samples can be 
analyzed by limited sample purification using MALDI-TOF IR. 

[0288] In Japanese Patent No. 59-131909, an instrument is described which 
detects nucleic acid fragments separated either by electrophoresis, liquid chromatography or 
high speed gel filtration. Mass spectrometric detection is achieved by incorporating into the 
nucleic acids atoms which normally do not occur in DNA such as S, Br, Lor Ag, Au, Pt, Os, 
Hg. 

F. Energy Transfer 

[0289] Labeling hybridization oligonucleotide probes with fluorescent labels is a 
well known technique in the art and is a sensitive, nonradioactive method for facilitating 
detection of probe hybridization. More recently developed detection methods employ the 
process of fluorescence energy transfer (FET) rather than direct detection of fluorescence 
intensity for detection of probe hybridization. FET occurs between a donor fluorophore and 
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an acceptor dye (which may or may not be a fluorophore) when the absorption spectrum of 
one (the acceptor) overlaps the emission spectrum of the other (the donor) and the two dyes 
are in close proximity. Dyes with these properties are referred to as donor/acceptor dye pairs 
or energy transfer dye pairs. The excited-state energy of the donor fluorophore is transferred 
by a resonance dipole-induced dipole interaction to the neighboring acceptor. This results in 
quenching of donor fluorescence. In some cases, if the acceptor is also a fluorophore, the 
intensity of its fluorescence may be enhanced. The efficiency of energy transfer is highly 
dependent on the distance between the donor and acceptor, and equations predicting these 
relationships have been developed by Forster, 1948. The distance between donor and 
acceptor dyes at which energy transfer efficiency is S0% is referred to as the Forster distance 
(Ro)* Other mechanisms of fluorescence quenching are also known including, for example, 
charge transfer and collisional quenching. 

[0290] Energy transfer and other mechanisms which rely on the interaction of two 
dyes in close proximity to produce quenching are an attractive means for detecting or 
identifying nucleotide sequences, as such assays may be conducted in homogeneous formats. 
Homogeneous assay formats are simpler than conventional probe hybridization assays which 
rely on detection of the fluorescence of a single fluorophore label, as heterogeneous assays 
generally require additional steps to separate hybridized label jfrom free label. Several formats 
for FET hybridization assays are reviewed in Nonisotopic DNA Probe Techniques (1992. 
Academic Press, Inc., pgs. 31 1-352). 

[0291] Homogeneous methods employing energy transfer or other mechanisms of 
fluorescence quenching for detection of nucleic acid amplification have also been described. 
Higuchi (1992), discloses methods for detecting DNA amplification in real-time by 
monitoring increased fluorescence of ethidium bromide as it binds to double-stranded DNA. 
The sensitivity of this method is Umited because binding of the ethidium bromide is not target 
specific and background amphfication products are also detected. Lee, 1993, discloses a real- 
time detection method in which a doubly-labeled detector probe is cleaved in a target 
amplification-specific maimer during PCR™. The detector probe is hybridized downstream 
of the amplification primer so that the 5'-3' exonuclease activity of Taq polymerase digests 
the detector probe, separating two fluorescent dyes which form an energy transfer pair. 
Fluorescence intensity increases as the probe is cleaved. Published PCT jqpplication WO 
96/21144 discloses continuous fluorometric assays in which enzyme-mediated cleavage of 
nucleic acids results in increased fluorescence. Fluorescence energy transfer is suggested for 
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use in the methods, but only in the context of a method employing a single fluorescent label 
which is quenched by hybridization to the target. 

[0292] Signal primers or detector probes which hybridize to the target sequence 
downstream of the hybridization site of the amplification primers have been described for use 
in detection of nucleic acid amplification (U.S. Pat. No. 5,547,861). The signal primer is 
extended by the polymerase in a manner similar to extension of the amplification primers. 
Extension of the amplification primer displaces the extension product of the signal primer in 
a target amplification-dependent maimer, producing a double-stranded secondary 
amplification product which may be detected as an indication of target amplification. The 
secondary amplification products generated firom signal primers may be detected by means of 
a variety of labels and reporter groups, restriction sites in ihc signal primer which are cleaved 
to produce fragments of a characteristic size, capture groups, and structural features such as 
triple helices and recognition sites for double-stranded DNA binding proteins. 

[0293] Many donor/acceptor dye pairs known in the art and may be used in the 
present invention. . These include, for example, fluorescein isothiocyanate 
(FITCytetramethylrhodamine isothiocyanate (TRTTC), FITC/Texas Red™. (Molecular 
Probes), FTTC/N-hydroxysuccinimidyl 1-pyrenebutyrate (PYB), FITC/eosin isothiocyanate 
(EITC), N-hydroxysuccinimidyl 1-pyrenesulfonate (PYS)/FITC, FITC/Rhodamine X, 
FITC/tetramethyhrhodamine (TAMRA), and others. The selection of a particular 
donor/acceptor fluorophore pair is not critical. For energy transfer quenching mechanisms it 
is only necessary that the emission wavelengths of the donor fluorophore overlap the 
excitation wavelengths of the acceptor, i.e., there must be sufficient spectral overlap between 
the two dyes to allow efficient energy transfer, charge transfer or fluorescence quenching. P- 
(dimethyl aminophenylazo) benzoic acid (DABCYL) is a non-fluorescent acceptor dye which 
effectively quenches fluorescence from an adjacent fluorophore, e.g., fluorescein or 5-(2'- 
aminoethyl) aminonaphthalene (EDANS). Any dye pair which produces fluorescence 
quenching in the detector nucleic acids of the invention are suitable for use in the methods of 
the invention, regardless of the mechanism by which quenching occurs. Terminal and 
internal labeling methods are both known in the art and maybe routinely used to link the 
donor and acceptor dyes at ttieir respective sites in the detector nucleic acid, 

G. Chip Technologies 

[0294] DNA arrays and gene chip technology provides a means of rapidly 
screening a large number of DNA samples for their ability to hybridize to a variety of single 
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stranded DNA probes immobilized on a solid substrate. Specifically contemplated are 
chip-based DNA technologies such as those described by Hacia et al, (1996) and Shoemaker 
et al (1996). These techniques involve quantitative methods for analyzing large numbers of 
genes rapidly and accurately The technology capitalizes on the complementary binding 
properties of single stranded DNA to screen DNA samples by hybridization. Pease et al., 
1994; Fodor et al, 1991. Basically, a DNA array or gene chip consists of a solid substrate 
upon which an array of single stranded DNA molecules have been attached. For screening, 
the chip or array is contacted with a single stranded DNA sample which is allowed to 
hybridize under stringent conditions. The chip or array is then scanned to determine which 
probes have hybridized. In the context of this embodiment, such probes could include 
synthesized oligonucleotides, cDNA, genomic DNA, yeast artificial chromosomes (YACs), 
bacterial artificial chromosomes (BACs), chromosomal markets or other constructs a person 
of ordinary skill would recognize as adequate to demonstrate a genetic change. 

[0295] A variety of gene chip or DNA array formats are described in the art, for 
example US Patent Nos. 5,861,242 and 5,578,832 which are expressly incorporated herein by 
reference. A means for applying the disclosed methods to the construction of such a chip or 
array would be clear to one of ordinary skill in the art. In brie^ the basic structure of a gene 
chip or array comprises: (1) an excitation source; (2) an array of probes; (3) a sampling 
element; (4) a detector; and (5) a signal amplification/treatment system. A chip may also 
include a support for unmobilizing the probe. 

[0296] In particular embodiments, a target nucleic acid may be tagged or labeled 
with a substance that emits a detectable signal; for example, luminescence. The target nucleic 
acid may be immobilized onto the integrated microchip that also supports a phototransducer 
and related detection circuitry. Altematively, a gene probe may be immobihzed onto a 
membrane or filter which is then attached to the microchip or to the detector surface itself. In 
a further embodiment, the immobilized probe may be tagged or labeled with a substance that 
emits a detectable or altered signal when combined with the target nucleic acid. The tagged 
or labeled species may be fluorescent, phosphorescent, or otherwise luminescent, or it may 
emit Raman energy or it may absorb energy. When the probes selectively bind to a targeted 
species, a signal is generated fliat is detected by the chip. The signal may then be processed 
in several ways, d^ending on the nature of the signal. 

[0297] The DNA probes may be directly or indirectly immobilized onto a 
transducer detection surface to ensure optimal contact and maximum detection. The ability to 
directly synthesize on or attach polynucleotide probes to solid substrates is well known in the 
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art. See U.S. Patent Nos. 5,837,832 and 5,837,860 both of which are expressly incorporated 
by reference. A variety of methods have been utilized to either permanently or removably 
attach the probes to the substrate. Exemplary methods include: the immobilization of 
biotinylated nucleic acid molecules to avidin/streptavidin coated supports (Hohnstrom, 
1993), the direct covalent attachment of short, 5'-phosphorylated primers to chemically 
modified polystyrene plates (Rasmussen, et al, 1991), or the precoating of the polystyrene or 
glass solid phases with poly-L-Lys or poly L-Lys, Phe, followed by the covalent attachment 
of either amino- or sulfhydryl-modified oligonucleotides using bi-fimctional crosslinking 
reagents. (Running, et al, 1990); Newton, et al (1993)). When immobilized onto a substrate, 
the probes are stabilized and therefore may be used repeatedly.' In general terms, 
hybridization is performed on an inunobilized nucleic acid target or a probe molecule is 
attached to a solid surface such as nitrocellulose, nylon membrane or glass. Numerous other 
matrix materials may be used, including reinforced nitrocellulose membrane, activated 
quartz, activated glass, polyvinylidene difluoride (PVDF) membrane, polystyrene substrates, 
polyacrylamide-based substrate, other polymers such as poly(vinyl chloride), poly(methyl 
methacrylate), poly(dimethyl siloxane), photopolymers (which contain photoreactive species 
such as nitrenes, carbenes and ketyl radicals ceqpable of fonning covalent links with target 
molecules. 

[0298] Binding of the probe to a selected support may be accompUshed by any of 
several means. For example, DNA is commonly bound to glass by first silanizing the glass 
surface, then activating with carbodimide or glutaraldehyde. Alternative procedures may use 
reagents such as 3-glycidoxypropyltrimethoxysilane (GOP) or aminopropyltrimethoxysilane 
(APTS) with DNA linked via amino linkers incorporated either at the 3' or 5' end of the 
molecule during DNA synthesis. DNA may be bound directly to membranes using 
ultraviolet radiation. With nitrocellous membranes, the DNA probes are spotted onto the 
membranes. A UV light source (Stratalinker, firom Stratagene, La JoUa, Ca.) is used to 
irradiate DNA spots and induce cross-linking. An alternative method for cross-linking 
involves baking the spotted membranes at 80°C for two hoxirs in vacuum. 

[0299] Specific DNA probes may first be immobilized onto a membrane and then 
attached to a membrane in contact with a transducer detection surface. This method avoids 
binding the probe onto the transducer and may be desirable for large-scale production. 
Membranes particularly suitable for this application include nitrocellulose membrane (e.g., 
fi-om BioRad, Hercules, CA) or polyvinylidene difluoride (PVDF) (BioRad, Hercules, CA) or 
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nylon membrane (Zeta-Probe, BioRad) or polystyrene base substrates (DNA.BIND™ Costar, 
Cambridge, MA). 

XI. IDENUHCATION METHODS 

[0300] Amplification products must be visualized in order to confirm 
amplification of the target-gene(s) sequences. One typical visualization method involves 
staining of a gel vnih for example, a fluorescent dye, such as ethidium bromide or Vista 
Green and visualization under UV light. Alternatively, if the amplification products are 
integrally labeled with radio- or fluorometrically-labeled nucleotides, the amplification 
products can then be exposed to x-ray film or visualized under the appropriate stimulating 
spectra, following separation. 

[0301] In one embodiment, visualization is achieved indirectly, using a nucleic 
acid probe. Following separation of amplification products, a labeled, nucleic acid probe is 
brought into contact with the amplified gene(s) sequence. The probe preferably is conjugated 
to a chromophore but may be radiolabeled. In anoth^ embodiment, the probe is conjugated 
to a binding partner, such as an antibody or biotin, where the other member of the binding 
pair carries a detectable moiety. In other embodiments, the probe incorporates a fluorescent 
dye or label. In yet other embodiments, the probe has a mass label that can be used to detect 
the molecule amplified. Other embodiments also contemplate the use of Taqman"^" and 
Molecular Beacon^'^ probes. In still other embodiments, solid-phase capture methods 
combined with a standard probe may be used as well. 

[0302] The type of label incorporated in PGR™ products is dictated by the 
method used for analysis. When using capillary electrophoresis, microfluidic electrophoresis, 
HPLC, or LC separations, either incorporated or intercalated fluorescent dyes are used to 
label and detect the PGR™ products. Samples are detected dynamically, in that fluorescence 
is quantitated as a labeled species moves past the detector. If any electrophoretic method, 
HPLCy or LC is used for separation, products can be detected by absorption of UV light, a 
property inherent to DNA and therefore not requiring addition of a label. If polyacrylamide 
gel or slab gel electrophoresis is used, primers for the PGR™ can be labeled with a 
fluorophore, a chromophore or a radioisotope, or by associated enzymatic reaction. 
Enzymatic detection involves binding an enzyme to primer, e.g., via a biotinravidin 
interaction, following separation of PGR™ products on a gel, thra detection by chemical 
reaction, such as chemiluminescence generated with luminol. A fluorescent signal can be 
monitored dynamically. Detection with a radioisotope or enzymatic reaction requires an 
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initial separation by gel electrophoresis, followed by transfer of DNA molecules to a solid 
support (blot) prior to analysis. If blots are made, they can be analyzed more than once by 
probing, stripping the blot, and then reprobing. If PCR™ products are separated using a mass 
spectrometer no label is required because nucleic acids are detected directly. 

[0303] A number of the above separation platforms can be coupled to achieve 
separations based on two different properties. For example, some of the PCR™ primers can 
be coupled with a moiety that allows aflSnity capture, and some primers remain uimiodified. 
ModijBcations can include a sugar (for binding to a lectin column), a hydrophobic group (for 
binding to a reverse-phase column), biotin (for binding to a streptavidin colurrm), or an 
antigen (for binding to an antibody column). Samples are run through an affinity 
chromatography column. The flow-through fraction is collected, and the bound fraction 
eluted (by. chemical cleavage, salt elution, etc.). Each sample is then further fractionated 
based on a property, such as mass, to identify individual components. 
Xn. SEQUENCING 

[0304] It is envisioned that amplified product will commonly be sequenced for 
further identification. Sanger dideoxy-temiination sequencing is the means commonly 
employed to determine nucleotide sequence. The Sanger method employs a short 
oligonucleotide or primer that is aimealed to a single-stranded template containing the DNA 
to be sequenced. The primer provides a 3' hydroxyl group which allows the polymerization 
of a chain of DNA when a polymerase enzyme and dNTPs are provided. The Sanger method 
is an enzymatic reaction that utilizes chain-terminating dideoxynucleotides (ddNTPs). 
ddNTPs are chain-terminating because they lack a 3'-hydroxyl residue which prevents 
formation of a phosphodiester bond with a succeeding deoxyribonucleotide (dNTP). A small 
amount of one ddNTP is included with the four conventional dNTPs in a polymerization 
reaction. Polymerization or DNA synthesis is catalyzed by a DNA polymerase. There is 
competition between extension of the chain by incorporation of the conventional dNTPs and 
termination of the chain by incorporation of a ddNTP. 

[0305] Although a variety of polymerases may be used, the use of a modified T7 
DNA polymerase (Sequenase™) was a significant improvement over the original Sanger 
method (Sambrook et al, 1988; HunkapillCT, 1991). T7 DNA polymerase does not have any 
inherent 5'-3' exonuclease activity and has a reduced selectivity against incorporation of 
ddNTP. However, the 3'-5' exonuclease activity leads to degradation of some of the 
oligonucleotide primers. Sequenase™ is a chemically-modified T7 DNA polymerase that has 
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reduced 3' to 5' exonuclease activity (Tabor et al, 1987). Sequenase™ version 2.0 is a 
genetically engineered form of the T7 polymerase which completely lacks 3' to 5' 
exonuclease activity. Sequenase™ has a very high processivity and high rate of 
polymerization. It can efBciently incorporate nucleotide analogs such as dITP and 7-deaza- 
dGTP which are used to resolve regions of compression in sequencing gels. In regions of 
DNA containing a high G+C content, Hoogsteen bond formation can occur which leads to 
compressions in the DNA. These compressions result in aberrant migration patterns of 
ohgonuclebtide strands on sequencing gels. Because these base analogs pair weakly with 
conventional nucleotides, intrastrand secondary structures during electrophoresis are 
alleviated. In contrast, Klenow does not incorporate these analogs as efficiently. 

[0306] The use of Taq DNA polymerase and mutants thereof is a more recent 
addition to the improvements of the Sanger method (U.S. Patent No. 5,075, 216). Taq 
polymerase is a thermostable enzyme which works efficiently at 70-75®C. The ability to 
catalyze DNA synthesis at elevated temperature makes Taq polymerase useful for sequencing 
templates which have extensive secondary structures at 37**C (the standard temperature used 
for Klenow and Sequenase'"^ reactions). Taq polymerase, like Sequenase™, has a high 
degree of processivity and like Sequenase 2.0, it lacks 3' to 5' nuclease activity. The thermal 
stability of Taq and related enzymes (such as Tth and Thermosequenase""^ provides an 
advantage over T7 polymerase (and all mutants thereof) in that these thermally stable 
enzymes can be used for cycle sequencing which amplifies the DNA during the sequencing 
reaction, thus allowing sequencmg to be perfonned on smaller amounts of DNA. 
Optimization of the use of Taq in the standard Sanger Method has focused on modifying Taq 
to eliminate the intrinsic 5'-3' exonuclease activity and to increase its ability to iQcorporate 
ddNTPs to reduce incoirect tennination due to secondary structure in the single-stranded 
teinplate DNA (EP 0 655 506 Bl). The introduction of fluorescently labeled nucleotides has 
further allowed the introduction of automated sequencing which further increases 
processivity. 

Xra. DNA IMMOBILIZATION 

[0307] Immobilization of the DNA may be achieved by a variety of methods 
involving either non-covalent or covalent interactions between the immobilized DNA 
comprising an anchorable moiety and an anchor. In a preferred embodiment of the invention, 
immobilization consists of the non-covalent coating of a solid phase with streptavidin or 
avidin and the subsequent immobilization of a biotinylated polynucleotide (Hohnstrom, 
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1993). It is further envisioned that immobilization may occur by precoatmg a polysome or 
glass solid phase with poly-L-Lys or poly L-Lys, Phe, followed by the covalent attachment of 
either amino- or sulfhydryl-modified polynucleotides using bifunctional crosslinking reagents 
(Running, 1990 and Newton, 1993). 

[0308] Immobilization may also take place by the direct covalent attachment of 
short, S'-phosphorylated primers to chemically modified polystyrene plates ("Covalink" 
plates, Nimc) Rasmussen, (1991). The covalent bond between the modified ohgonucleotide 
and the solid phase surface is introduced by condensation with a water-soluble carbodiimide. 
This method facilitates a predominantly S'-attachment of the oligonucleotides via their S'- 
phosphates. 

[0309] Nikiforov et al (U.S. Patent 5610287 incorporated herein by reference) 
describes a method of non-covalently immobilizing nucleic acid molecules in the presence of 
a salt or cationic detergent on a hydrophilic polystyrene solid support containing a 
hydrophilic moiety or on a glass solid support. The support is contacted with a solution 
having a pH of about 6 to about 8 containing the synthetic nucleic acid and a cationic 
detergent or salt. The support containing the immobilized nucleic acid may be washed with 
an aqueous solution containing a non-ionic detergent without removing the attached 
molecules. 

[0310] Another commercially available method envisioned by the inventors to 
facilitate immobilization is the 'lleacti-Bind.TM. DNA Coating Solutions" (see 
"Instmctions-Reacti-Bind.TM. DNA Coating Solution" 1/1997). This product comprises a 
solution that is mixed with DNA and applied to surfaces such as polystyrene or 
polypropylene. After overnight incubation, the solution is removed, the surface washed with 
buffer and dried, after which it is ready for hybridization. It is envisioned that similar 
products, Le. Costar *T)NA-BIND™" or Immobilon-AV Affinity Membrane (lAV, Milhpore, 
Bedford, MA) are equally applicable to immobilize the respective fragment. 
XIV. ANALYSIS OF DATA 

[0311] Gathering data from the various analysis operations will typically be 
carried out using methods known in the art. For example, microcapillary arrays may be 
scanned using lasers to excite fluorescently labeled targets that have hybridized to regions of 
probe arrays, which can then be imaged using charged coupled devices ("CCDs'*) for a wide 
field scaiming of the array. Alternatively, another particularly useful method for gathering 
data from the arrays is through the use of laser confocal microscopy which combines the ease 
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and speed of a readily automated process with high resolution detection. Scanning devices of 
this kind are described in U.S. Patent Nos. 5, 143,854 and 5,424,186. 

[0312] Following the data gathering operation, the data will typically be reported 
to a data analysis operation. To facilitate the sample analysis operation, the data obtained by a 
reader from the device will typically be analyzed using a digital computer. Typically, the 
computer will be appropriately programmed for receipt and storage of the data from the 
device, as well as for analysis and reporting of the data gathered, i.e., interpreting 
fluorescence data to determine the sequence of hybridizing probes, normalization of 
backgroimd and single base mismatch hybridizations, ordering of sequence data in SBH 
applications, and the like, as described in, e.g., U.S. Patent Nos. 4,683,194; 5,599,668; and 
5,843,651, each of which is incorporated herein by reference, 
XV. PLANTS 

[0313] The tenn **plant," as used herein, refers to any type of plant The inventors 
have provided below an exemplary description of some plants that may be used with the 
invention. However, the list is not in any way limiting, as other types of plants will be known 
to those of skill in the art and could be used with the invention. 

[03141 A common class of plants exploited in agriculture are vegetable crops, 
including artichokes, kohlrabi, arugula, leeks, asparagus, lettuce (e.g., head, leaf, romaine), 
bok choy, malanga, broccoU, melons (e.g., muskmelon, watermelon, crenshaw, honeydew, 
cantaloupe), brussels sprouts, cabbage, cardoni, carrots, napa, cauliflower, okra, onions, 
celery, parsley, chick peas, parsnips, chicory, Chinese cabbage, peppers, coUards, potatoes, 
cucumber plants (marrows, cucumbers), pumpkins, cucurbits, radishes, dry bulb onions, 
rutabaga, eggplant, salsify, escarole, shallots, endive, garUc, spinach, green onions, squash,- 
greens, beet (sugar beet and fodder beet), sweet potatoes, swiss chard, horseradish, tomatoes, 
kale, turnips, and spices. 

[0315] Other types of plants frequently finding commercial use include fruit and 
vine crops such as apples, apricots, cherries, nectarines, peaches, pears, plums, prunes, quince 
ahnonds, chestnuts, filberts, pecans, pistachios, walnuts, citrus, blueberries, boysenberries, 
cranberries, currants, loganberries, raspberries, strawberries, blackberries, grapes, avocados, 
bananas, kiwi, persimmons, pomegranate, pineapple, tropical fioiits, pomes, melon, mango, 
papaya, and lychee. 

[0316] Many of the most widely grown plants are field crop plants such as 
evening primrose, meadow foam, com (field, sweet, popcom), hops, jojoba, peanuts, rice, 
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safiQower, small grains (barley, oats^ lye, wheat, etc.), sorghum, tobacco, ksqpok, leguminous 
plants (beans, lentils, peas, soybeans), oil plants (rape, mustard, poppy, olives, sunflowers, 
coconut, castor oil plants, cocoa beans, groundnuts), fibre plants (cotton, flax, hemp, jute), 
lauraceae (cinnamon, camphor), or plants such as coffee, sugarcane, tea, and natural rubber 
plants. 

[0317] Still other examples of plants include bedding plants such as flowers, 
cactus, succulents and ornamental plants, as well as trees such as forest (broad-leaved trees 
and evergreens, such as conifers), fruit, ornamental, and nut-bearing trees, as well as shrubs 
and other nursery stock, 
XVI. ANIMALS 

[0318] The term "animal," as used herein, refers to any type of animal. The 
inventors have provided below an exemplary description of some animals that may be used 
with the invention. However, the list is not m any way Umiting, as other types of animals 
will be known to those of skill in the art and could be used with the invention. 

[0319] For the purpose of the instant invmtion, the term animal is expressly 
construed to include humans. 

[0320] In addition to humans, other animals of importance in the context of the 
instant invention are those animals deemed of commercial relevance. Animals of commercial 
relevance specifically include domesticated species including companion and agricultural 
species. 

XVU. BACTERIA 

[0321] The present invention is useful in sequencing the genome of bacteria. 
Bacteria is herein defined as a unicellular prokaryote. Examples include, but are not limited 
to, the 83 or more distinct serotypes of pneumococci, streptococci such as 51 pyogenes^ S. 
agalactiae, S, equi, S. canis, S. bovis, S. equinus, S. anginosus^ S, sanguis, S, salivariuSy S. 
mitis, S. mutans, other viridans streptococci, peptostreptococci, other related species of 
streptococci, enterococci such as Enterococcus faecaliSy Enterococcus faecium. 
Staphylococci, such as Staphylococcus epidennidis. Staphylococcus aureus. Hemophilus 
influenzae, pseudomonas species such as Pseudomonas aeruginosa, Pseudomonas 
pseudomallei, Pseudomonas mallei, bmcellas such as Brucella melitensis. Brucella suis. 
Brucella abortus, Bordetella pertussis. Neisseria meningitidis, Neisseria gonorrhoeae, 
Moraxella catarrhalis, Corynebacterium diphtheriae, Corynebacterium ulcerans, 
Corynebacterium pseudotuberculosis, Corynebacterium pseudodiphtheriticum. 
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Corynebacterium urealyticum, Corynebacterium hemolyticum, Corynebacterium equi, etc. 
Listeria monocytogenes, Nocordia asteroides, Bacteroides species, Actinomycetes species, 
Treponema pallidum, Leptospirosa species and related orgatusms. The invention may also be 
useful for determining genomic sequences of gram negative bacteria such as Klebsiella 
pneumoniae, Escherichia coli, Serratia species, Acinetobacter^ Francisella tularensis^ 
Enterobacter species, Bacteriodes and like. 

[0322] Other bacteria species include Bacteroides forsythus, Porphyromonas 
gingivalis, Prevotella intermedia and Prevotella nigrescens, Actinobacillus 
actinomycetemcomitana, Actinomyces, A. viscosus, A. naeslundii, Bacteroides forsythus. 
Streptococcus intermedius, Campylobacier rectus and Campylobacter jejuni, 
Peptostreptococcus, Eikenella corrondens, P. anaerobius, Eubacterium, P. micros, E. 
alactolyticum, E. brachy, Fusobacterium, F. alocis, F. nucleatum, Porphyromonas gingivalis, 
Prevotella, P. intermedia, P. nigrescens, Selenomonas sputigena, Treponema, T denticola, 
and 7! socransHL 

[0323] Other bacterial species include Campylobacter species, such as 
Cryptosporidium, Giardia, Leptospira, Pasteurella, Proteus, Shigella, Vibrio species, such as 
Vibrio cholerae, V. alginolyticus, V.fluyialis, V. mimicus, V. parahaemolyticus, V vulnificus 
and other Vibrio spp.. Salmonella typhimurium, S. typhi, Proteus sp.. Yersinia enterocolitica. 
Vibrio parahaemo-lyticus, Acinetobacter calcoaceticus, Aeromonas hydrophila. A, sobria, A, 
caviae, C coli, Chromobacterium violaceum, Citrobacter spp,, Clostridium perfringens, 
Flavobacterium meninogsepticum, Francisella tularensis, Fusobacterium necrophorum, 
Legionella pneumophila and other Legionella spp., ^Morganella morganii, Mycobacterium 
tuberculosis, M. marinum and other Mycobacterium spp,, Plesiomonas shigelloides, 
Salmonella enteritidis, S, montevideo B, S. typhimurium and other Salmonella serotypes, S. 
paratyphi A and B, S. typhi, Serratia marcesens, Enterobacter aerogenes, Proteus mirabills, 
Proteus vulgaris, Pseudomonas aeruginosa, Streptococcus faecalis, mycobactin, Clostridium 
botulinum, Streptococcus faecalis, Proteus vulgaris, Pseudomonas aeruginosa, 
Enterobacteriaceae, Yersinia pestis, Yersinia pseudotuberculosis,Stenotrophomonas 
maltophilia, burkfiolderia cepacia, Gardnerella vaginalis, Bartonella spp.,Hafhia spp., 
Buttlauxella, Cedecea, Ewingella, Providencia, Cpsittaci, and C trachomatis, 

[0324] Bacterial plant pathogens include species of Agrobacteria (e.g., Agaricus 
bisporus (Lange) Lnbach or Agrobacterium tumefaciens\ Clavibacter^ Corynebacterium, 
Erwinia (e.g., Erwinia carotovora subsp. Carotovord), Pseudomonas (e,g, Pseudomonas 
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tolaasii Paine^ Pseudomonas solanacearum^ Pseudomonas syringae pv.) and Xanthomonas 
{e.g.yXanthomonas campestris pv. Malvacearum), 

EXAMPLES 

[0325] The following examples are included to demonstrate preferred 
embodiments of the invention. It should be q)preciated by those of skill in the art that the 
techniques disclosed in the examples which follow represent techniques discovered by the 
inventor to function well in the practice of the invention, and thus can be considered to 
constitute preferred modes for its practice. However, those of skill in the art should, in light 
of the present disclosure, appreciate that many changes can be made in the specific 
embodiments which are disclosed and still obtain a like or similar result without departing 
jGrom the spirit and scope of the invention. 

EXAMPLE 1 

PREPARATION AND ANALYSIS OF PENTAmer LIBRARY FROM 
K COLIBAMK I COMPLETE GENOMIC DIGEST 

[0326] Li Ihe following examples, primary genomic PENTAmer library is defined 
as library produced from complete or partial restriction digest after ligation of nick- 
translation adaptor A from which a time-controlled nick-translation is performed, followed by 
ligation of nick-attaching adaptor B to the 3 '-terminus of synthesized PENT product. 
Primary genomic libraries are highly rq)resentative since no amplification bias has been 
imposed on them. 

[0327] This example describes a protocol for preparation of primary PENTAmer 
library from E, coli genomic DNA with upstream nick-translation BaniR I compatible 
adaptor A and downstream nick-attaching adaptor B having randomized bases at the strand 
used to direct ligation at the 3 ' end of nick-translated PENT molecules. 

[0328] Genomic DNA from E, coli MG-1655 is prepared by standard procedure . 
Ten micrograms of DNA are digested at 3TC for 4 hours with 120 units of BamB. I 
restriction enzyme (NEB) in total volume of 150 \x\. The sample is split into two tubes, 
diluted twice with water, supplemented with 1 x Shrimp Alkaline Phosphatase (SAP) buffer 
(Roche; Nutley, NJ), and the DNA is dephosphorylated with 10 units of SAP (Roche; Nutley, 
NJ) for 20 min at SAP is heat-inactivated for 15 min at 65**C and DNA is purified by 
extraction with equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) followed by 
precipitation with ethanol. Digested DNA is dissolved in 50 fd of 10 mM Tris-HCl, pH 7.5, 
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[0329] The sample is mixed with 3 pmoles of pre-assembled BamK I nick- 
translation adaptor (adaptor A3 consisting of primers 11, 12, and 13), and ligation is carried 
out overnight at l&'C with 1200 units of T4 ligase (NEB) in 60 p.1 volume. To remove ligase 
and excess free adaptor, the sample is extracted with equal volume of 
phenol:chloroform:isoamyl alcohol (25:24:1), supplemented with 1/4 volume of QF buffer 
(final concentrations of 240 mM NaCl, 3 % isopropanol, and 10 mM Tris-HCl, pH 8.5) in a 
volume of 400 \il and centrifuged at 200 x g to a volume of approximately 100 \il The 
sample is washed 3 times with 400 ml of TE-L buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 
7.5) at 200 X g and concentrated to a final volume of 80 ^1. 

' TABLE VI - ADAPTOR STRUCTURES 
Adaptor A3 (Bam HI, Sau 3AI) 

(50 P gatctgaggttgttgaagcgttuacccaautcgatuaggcaa N-C7 (3') (SEQ ID NO:29) 
(30 N-C7 actccaacaacttc gcaaaugggtuaagcuaatccgtt Biotin (50 (SEQ ID NO:30) 

Adaptor B 1 (Poly N universal) 

(50 P aagtctgcaagatcatcgcggaaggtgacaaagactcgtatcgtaaNNNNc N-C7(30 

(SEQIDN0:31) 

(30 N-C7 ttcagacgttctagtagcgccttccactgtttctgagcatagcatt- P(50 

(SEQIDNO:32) 

wherein N-C7 = Amino C7 Blocking group 
P = 5* phosphate 

[0330] The purified sample is subjected to nick-translation with 20 units of wild 
type Tag polymerase in Ix Perkin Elmer (Norwalk, CT) PGR buffer buffer n containing 2 
mM MgCl2 and 200 mM of each dNTP for 5 min at 50**C. The reaction is stopped by 
addition of 5 |il of 0.5 M EDTA pH 8.0, and products are analyzed on 6% TBE-urea gel 
(Novex; San Diego, CA) after staining with Sybr Gold, 

[0331] To increase representativity of single-stranded PENT molecules bound to 
streptavidin beads and to prevent their reassociation with the strand used as template for nick- 
translation in the region of the adaptor, an oligonucleotide complementary to the template 
strand spanning the entire adaptor sequence (primer 15) is added at a final concentration of 
0.8 n^iM, and the sample is denatured by boiling at lOO^'C for 3 min and cooling on ice for 5 
min. Eigiht himdred micrograms of streptavidin-coated Dynabeads M-280 (Dynal) are 
prewashed with TE-L buffer and resuspended in 2x BW buffer (20 mM Tris-HCl, 2 mM 
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EDTA, 2 M NaCl, pH 7.5). Denatured DNA is mixed with equal volume of beads suspension 
in 2x BW buffer and placed on a rotary shaker for 1 hr at room temperature. The beads are 
bound to magnet and washed with 3 x 100 )il each of 1 x BW buffer and TE-L buffer. Non- 
biotinylated DNA is removed by incubating the beads in 100 ml of 0.1 N NaOH for 5 min at 
room temperature. Beads are neutralized by washing with 5 x 100 )il of TE-L buffer and 
resuspended in 20 |j,l of water. 

[0332] Adaptor Bl is ligated to the single-stranded library of PENT molecules 
bound to magnetic beads. Adaptor Bl consists of two oligonucleotides: one is S'- 
phosphorylated and 3 '-blocked (primer 16); and a second is its complement, which has a 3'- 
extension of four random bases and is also 3 '-blocked (primer 17). The latter oligonucleotide 
will anneal and direct the phosphoiylated ad^tor strand to the free 3 '-end of single-stranded 
genomic PENT library molecules. The library DNA from the previous step is mixed with 40 
pmoles of each adaptor Bl oligonucleotide (primers 16 and 17) in Ix T4 ligase buffer and 
1200 units of T4 ligase (NEB) in final volume of 30 ^1. Ligation is performed at room 
temperature for 1 hour on an end-to-end rotary shaker to keep the beads in suspension. Beads 
are bound to magnet, washed with 2 x 100 jxl each of 1 x BW buffer and TE-L buffer and 
nonbiotinylated DNA molecules are removed by incubating the beads in 100 ^1 of 0.1 N 
NaOH for 5 min at room temperature. Beads are neutralized by washing with 5 x 100 jil of 
TE-L buffer, resuspended in 100 |il of storage buffer (SB buffer, containing 0.5 M NaCl, 10 
mM Tris-HCl, 10 mM EDTA, pH 7.5) and stored at 4*^0. 

[0333] FIG. 20 shows analysis of 5 selected random sequences in the E, coli 
genome adjacent to BamH I sites to assess the quality and representativity of the library. One 
naicroliter of library beads diluted 10 x in water (approximately 0.1 % of the total library 
DNA) are used as template in PCR amplification reactions with universal adaptor Bl primer 
(primer 18) and S specific E. coli primers adjacent to BamK I sites. A negative control with 
adaptor Bl primer alone and a positive control with ads^tor Bl and adaptor A3 primers 
(primers 14 and 18) are also included. After initial denaturing at 95®C for 1 min, 30 cycles of 
94°C for 10 sec and eS'^C for 75 sec are carried out. Aliquots of the PCR reactions are 
separated on 1% agarose gel and visualized on Fluor S Multilmager (Bio Rad) after staining 
with Sybr Gold. All five analyzed E. coli sequences are present in the library and are 
amplified as 1 Kb fi-agments. The sequences are confirmed by Themo Sequenase Cy5.5 Dye 
Terminator Cycle Sequencing kit (Amersham Pharmacia Biotech; Piscataway, NJ) protocol 
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on OpenGene sequencing system (Visible Genetics) as described in Example 6 with tiie same 
kernel primers used in PCR. 

EXAMPLE 2 

PREPARATION OF SECONDARY E. CQL/GENOMIC BAMSH 
PENTAMER LIBRARY • 

[0334] Secondaiy library in the following examples is defined as a library derived 
from primary genomic PENTAmer library by either exponential or linear amplification, 
which is primarily used as template for selection by ligation and/or extension directed from 
adaptor A toward adaptor B and thus for the purpose of this application is the strand 
complementary to the PENT (nick-translation) strand of the primary library form which it is 
derived. Secondary libraries are potentially biased in representation of genomic sequences. 

[0335] This example describes the preparation of secondary library derived by 
PGR amplification of the primary PENTAmer £. coli BamB. I library described in Example 1 . 
The library is diluted and amplified by PGR in the presence of dUTP and biotinylated Bl 
adaptor oligonucleotide. Biotinylated dU containing strands are captured to magnetic 
streptavidin beads. Finally, to prevent the free 3 'ends from self-priming during primer 
extension reactions, 3 '-ends are blocked by transfer of dideoxy adenosine with terminal 
transferase. The library is used as template for selection by assembly, ligation, and extension 
of contigs of short ohgonucleotides at specific positions or for direct primer extension of 
kernel sequences. 

[0336] One microliter of primary PENTAmer E. coli BamR I genomic library 
beads diluted 10 times in water (approximately 0.1 % of the total primary library) is used as 
PGR template with biotinylated adq)tor Bl primer (primer 19) and adaptor A3 PGR primer 
Cprimer 14) m the presence of 0.2 mM of each dNTP and 0.2 mM dUTP. Aft^ 25 cycles at 
94'*G for 10 sec and 62f'C for 75 sec, three reaction tubes of 25 nl each are combined. The 
sample is diluted to 300 \sX with TE-L buffer (10 mM Tris-HGl, 0.1 mM EDTA, pH 7.5), 
supplemented with Va volume of QF buffer (final concentrations of 240 mM NaGl, 3 %. 
isopropanol, and 10 mM Tris-HGl, pH 8.5) and centrifiiged at 200 x g in Microcon YM-100 
(Millipore; Bedford, MA) filter to a volume of 100 |il. The sample is then washed 2 times 
with 400 \x\ of TE-L buffer at 200 x g and concentrated to a final volume of 120 ^il. Three 
hundred micrograms of streptavidin-coated Dynabeads M-280 (Dynal) are prewashed with 
TE-L buffer and resuspended in 2x BW buffer (20 mM Tris-HGl, 2 mM EDTA, 2 M NaGl, 
pH 7.5). The DNA sample is mixed with equal volume of beads suspension in 2x BW buffer 
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and placed on rotary shaker for 1 hr at room temperature. The beads are bound to magnet 
and washed with 3 x 100 til each of 1 x BW buffer and TE-L buffer. Non-biotinylated DNA 
is removed by incubating the beads in 100 jil of 0.1 N NaOH for 5 min at room temperature. 
Beads are neutralized by washing with 5 x 100 \i\ of TE-L buffer and then resuspended in 20 
ml of water. 

[0337] To block free 3' termini the beads are supplemented with Ix terminal 
transferase buffer (Roche; Nutley, NJ), 0.25 mM C0CI2, 0.1 mM ddATP, and 200 units of 
terminal transferase (NEB) in a final volume of 50 \xl and reaction is carried out at ST^'C for 
30 min. Beads are washed with 2 x 100 |xl each of TB-L buffer and 1 x BW buffer, 
resuspended in 50 \il of SB buffer (0.5 M NaCl, 10 mM Tris-HCI, 10 mM EDTA, pH 7.5) 
and stored at 4**C. 

EXAMPLES 

ASSEMBLY OF SHORT OLIGONUCLEOTIDES AT SPECIFIC K COLIGENOMIC 
KERNEL SEQUENCE BY THERMO-STABLE DNA LIGASE USING SECONDARY 
E. COLIGENOmC BAMHl PENTAMER LIBRARY AS TEMPLATE 

[0338] This example describes the assembly of contigs of 5 or 8 nonamer 
oligonucleotides at specific E, coli kemel sequence adjacent to ^amH I restriction site by 
using thermo-stable ligase and secondary E, coli genomic BaniSl PENTAmer Ubraiy 
described in Example 2 as template. 

[0339] Two sets of oligonucleotides complementary to a kemel sequence adjacent 
to BamYl I restriction site are mixed in 1 x Tsc ligase buffer (Roche; Nutley, NJ) as follows: 

[0340] Set 1. OUgonucleotides 1, 2, 3, 4, and 5 annealing at the selected kemel as 
contig (FIG. 21 A, Table VII) are mixed at final concentration of 10 nM each, except 
oligonucleotide 5, at 50 nM. OUgonucleotide 1 is complementary in its twelve 3 '-terminal 
bases to adaptor A3 sequence immediately upstream fix>m the BamR I restriction site and has 
an unique S' extension of 23 bases used as PGR priming site. Oligonucleotide S is 
complementary in its nine S '-terminal bases to tiie sequence being selected and has a unique 
3 '-extension of 23 bases used as second priming site for PGR. All oUgonucleotides except 
ohgonucleotide 1 are 5 '-phosphorylated. 

[0341] Set 2. OUgonucleotides 1, 2, 3, 4, 5A, 6, 7 and 8 annealing at the selected 
kernel as contig (FIG. 21B, Table VII) are mixed at final concentration of 10 nM each except 
oUgonucleotides 5 A and 8, at 50 nM. Oligonucleotide 1 is complementary in its twelve 3'- 
terminal bases to adaptor A3 sequence immediately upstream firom the BaniH. I restriction site 
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and has a unique 5' extension of 23 bases used as PCR priming site. Oligonucleotide 8 is 
complementary in its nine 5 '-terminal bases to the sequence being selected and has a unique 
3 '-extension (identical to the extension of oligonucleotide 5) of 23 bases used as second 
priming site for PCR. All oUgonucleotides except oligonucleotide 1 are 5'-phosphorylated. 



TABLE m OLIGONUCLEOHDES* 


Number 


Sequence (5 '-SO 


Length 
(bases) and 
Modifications 


Application 


1. 


egg tgc atg tgt ate gtc cgsa gtt caa 
caa cct ca (SEQ ID NO: 1) 


35 


Universal primer for 
selection by ligation 


2. 


gat ccc cat (SEQ ID N0:2) 





selective contig assembly 


3. 


ttc cag acg (SEQ ID N0:3) 


9^ 


selective contig assembly 


4. 


ata agg ctg (SEQ ID N0:4) 





selective contig assembly 


5. 


cat taa ate ate gea gta gca ttg act 

cag CO (SEQIDN0:5) 


. 32^ 


selective contig assembly 
with unique 3' extension 


5A. 


cat taa ate (SEQ ID N0:6) 




selective contig assembly 


6. 


gag egg gcg (SEQ ID N0:7) 


9^ 


selective contig assembly 


7. 


cag tac gcc (SEQ ID N0:8) 


9^ 


selective contig assembly 


8. 


ata caa gcc ate gea gta gea ttg act 
cag cc(SEQ ID NO:9) 


32^ 


selective contig assembly 
with unique 3' extension 


8A. 


ata caa gcc (SEQ ID NO:10) 


9^ 


selective contig assembly 


9. 


egg tgc atg tgt ate gtc cga gt (SEQ 
IDN0:11) 


23 


Upstream PCR primer used 
to amplify sequences selected 
by assembly of short oligos 


10. 


gge tga gtc aat get act gcg at 
(SEQIDN0:12) 


23 


Downstream PCR primer 
used to ampMfy sequences 
selected by assembly of short 
oligos 


11. 


gat ctg agg ttg ttg aag cgt 
(SEQ ID NO: 13) 




Adaptor A3 backbone tua 
ccc 


12. 


Ttg cct aau cga aut ggg uaa acg 
(SEQIDN0:14) 


24^ 


Adaptors A3 nick- 
translation 
primer 


13. 


ctt caa caa cct ca 
(SEQ ID NO: 15) 


14^ 


Adaptor A3 blocking primer 


14. 


ttg cct aat cga att ggg taa acg 
(SEQIDN0:16) 


24 


Adaptors A3 PCR primer 


15. 


ttg cct aat cga att ggg taa acg ctt 
caa caa cct cag ate 
(SEQIDN0:17) 


42*^ 


AdaptorA3 backbone 
complement block 


16. 


tta cga tac gag tct ttg tea cct tec 
gcg atg ate ttg cag act t 
(SEQIDN0:18) 


46 


Adaptor Bl phosphorylated 
strand 


17. 


aag tct gca aga tea teg egg aag 


51^ 


Adaptor Bl poly N strand 
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TABLE Vn. OLIGONUCLEOTIDES* 


Number 


Sequence (S'-S') 


Length 
(bases) and 
Modifications 


Application 




gtgaca aag act cgt ate gta 
aNNNNc (SEQIDN0:19) 






18. 


aag tot gca aga tea teg egg aa 
(SEQIDNO:20) 


23 


Adaptor Bl distal PGR 
primer 


19. 


aag tct gca aga tea teg egg aa 
(SEQIDN0:21) 


23" 


Adaptor B 1 PGR primer with 
5' biotin 


20. 


acg ggc tag caa aat age get gtc 
c(N)g ate tga ggt tgt tga age g 
(SEQIDNO:22) 


46*^ 


Blocking primer to prevent 
adaptor A3-B1 dimers 
formation 


21. 


gga cag cgc tat ttt get age ccg t 
(SEQIDNO:23) 


25*^ 


Blocking primer to prevent 
adaptor A3-B 1 dimers 
formation 


22. 


ggt gac aaa gac teg tat cgt aa 
(SEQIDNO:24) 


23 


Adaptor Bl proximal PGR 
primer 


23. 


ttg cct aat cga att ggg taa acg 
(SEQIDNO:25) 


24 


Adaptors A3 PGR primer 


24. 


gat ctg agg ttg ttg aag cgt tta ccc 
aat teg att agg caa agg tct gca aga 
tea teg (SEQIDNO:26) 


60^ 


Bridging oUgonucleotide for 
circularization of single- 
stranded PENTamere 
libraries 


25. 


tta ccc aat teg att agg caa 
(SEQIDNO:27) 


21 


Adaptor A3 circular PGR 
primer 


26. 


cgc ttc aac aac etc aga tc 
(SEQIDNO:28) 


20 


Adaptor A3 circular PGR 
primer 



*A11 oUgonucleotides are synthesized at Integrated DNA Technologies 
^ 5' Gy 5.0 labeled 5'biotinylated 
^ 5'phosphorylated N random base 

3 'G7 amino blocked 
® 5' fluorescein labeled 



[0342] Three microliters of 2.5-fold diluted secondary E. coli genomic BamHl 
PENTAmer library beads prepared as described in Example 2 are added to the prepared sets 
of oligonucleotides together with 7.5 units of Tsc ligase (Roche; Nutley, NJ) or 1 x Tsc buffer 
as control in final volume of 30 \iL Incubation is carried out at 32'*G or 45'*G for 3 hours. 
Beads are washed 2 times with 50 ml each of 2x BW buJffer and TE-L buffer and non- 
biotinylated DNA is eluted with 20 ^1 of 0.1 N NaOH for 3 min at 37'*G. Beads are bound to 
magnet and supematants neutralized with 10 ml of 0.2 N HGl and 3 of 1 M Tris-HGl, pH 
8.0. Samples are diluted to 100 [xl with water, spUt in 2 aliquots of 50 |li1 and one aliquot is 
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treated with 1 unit of heat-labile uracil-DNA glycosylase (UDG, Roche; Nutley, NJ) for 2 
hours at 20°C. UDG is inactivated for 10 min at 95°C and 1 ^1 of 3-fold diluted aliquot of 
each sample is used as template for PGR with primer identical to the unique 5' extension of 
oligonucleotide 1 (primer 9) and primer complementary to the unique 3' extension of 
oligonucleotides 5 and 8 (primer 10). 

[0343] FIG. 22 shows analysis of 10 aliquots of the PGR reactions by 
electrophoresis on 10% TBB acrylamide gel (Novex; San Diego, GA) after staining with Sybr 
Gold onBio-Rad (Hercules, GA) Fluor S Multilmager. Both 5 oligonucleotide and 8 
oligonucleotide contigs were assembled as evidenced by 94 bp and 121 bp amplicons 
obtained by PGR respectively. 

[0344] This example demonstrates that contigs of short oligonucleotides can be 
successfully assembled at specific kernel positions using secondary E. coli PENTAmer 
library as template. Assembled contigs are stable upon washing in low salt buffer (TE-L) and 
can be extended with DNA polymerase at high temperature as shown in Example 4. Selected 
sequences can be used for walking, sequencing, and for gap filling after destroying any 
residual dU-containing PENTAmer molecules with uracil DNA glycosylase. 

EXAMPLE 4 

SELECTION OF SPECIFIC E. COt/PENTAMER SEQUENCE BY ASSEMBLY OF 

SHORT OLIGONUCLEOTIDES FOLLOWED BY EXTENSION WITH DNA 
POLYMERASE AND LIGATION OF UNIVERSAL OLIGONUCLEOTIDE AT 
ADAPTOR A USING SECONDARY E. COL/GENOMIC BAMEl PENTAMER 

LIBRARY AS TEMPLATE 

[0345] This example describes amplification of specific E. coli PENTAmer 
sequence by assembly of short oligonucleotides, followed by extension and ligation of 
universal adaptor A oligonucleotide having unique 5 '-terminal extension used as priming site 
for PGR. 

[0346] Oligonucleotides 2, 3, 4, 5 A, 6, 7 and 8 A amiealing as contig at specific 
kemel sequence adjacent to BamU I restriction site (Example 3, FIG. 21B) are mixed in 1 x 
Tsc ligase buflfer (Roche; Nutley, NJ) at final concentration of 10 nM each except 
oligonucleotides 5 A and 8A, at SO nM. All oligonucleotides are 5 '-phosphorylated. Four 
microliters of 2.S-fold diluted secondary E, coli genomic BartiHl PENTAmer library beads 
prepared as described in Example 2 are added to the oligonucleotide mix in total volume of 
100 ml. The sample is divided into 3 aliquots. 7.5 units of Tcs DNA ligase (Roche; Nutley, 
NJ) are added to tube #1 and tube # 2 whereas tube # 3 (control) receives 1.5 jil of 1 x Tsc 
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ligase buffer. Incubation is carried out at 45^C for 2 hours. Beads are washed 2 times with 50 
ml each of 2x BW buffer and TE-L buffer and resuspended in 5 ^1 of water. Samples are 
then supplemented with 1 x ThermoPol buffer (NEB), 10 mM MgCh, 5 units of Bst DNA 
polymerase (NEB) and 0.2 mM of each dNTP in final volume of 60 ml and extension 
reaction is carried out at 55**C for 3 noin. Reactions are stopped by addition of 1 ml of 0.5M 
EDTA, pH 8.0 and beads are washed with 2 x 50 }il of 2x BW buffer, 2 x 50 ^l of TB-L 
buffer and 50 pi of water. Beads are then resuspended in 25 pi of water. 

[0347] Samples are supplemented with 1 x Tsc ligase buff^ (Roche; Nutley, NJ) 
and 10 nM of oligonucleotide 1 (Table YU) in final volume of 30 pi. Oligonucleotide 1 is 
complementary in its twelve 3 '-terminal bases to adaptor A3 sequence adjacent to the 
assembled contig and has an unique 5' extension of 23 bases used later as PGR priming site. 
Five units of Tsc DNA ligase (Roche; Nutley, NJ) are added to samples #1 and # 3 whereas 
sample #2 receives 1 pi of 1 x Tsc Ugase buffer. Ligation is carried out at 45**C for 1 hour. 
Beads are washed sequentially with 2 x 50 pi of 2x BW buffer, 2 x 50 p,l TE-L buffer, 50 pi 
of water, 2x 50 ^il of 2x BW buffer, and 50 pi of TE-L buffer. Non-biotinylated DNA is 
eluted with 20 pi of 0.1 N NaOH for 3 min at 37**C. Beads are removed on magnet and 
supernatant is neutrahzed with 10 pi of 0.2 N HCl and 3 ^l of 1 M Tris-HCl, pH 8.0. 
Samples are diluted to 100 pi with water, split into two aliquots of 50 fxl and one half treated 
with 1 unit of heat-labile uracil-DNA-glycosylase (UDG, Roche; Nutley, NJ) for 2 hours at 
20°C. UDG is inactivated for 10 min at 95*C and 1 ill of 3-fold diluted aliquot of each 
sample is used as template for PGR. AmpUfication is performed with primer identical to the 
unique 5' extension of oligonucleotide 1 (primer 9) or kemel primer adjacent to the Bam H I 
site of the selected PENTAmer and universal adaptor Bl primer primer 18). 

[0348] FIG. 23 shows analysis of 12 pi aliquots of the PGR reactions by 
electrophoresis on 10% TBE acrylamide gel (Novex; San Diego, CA) after staining with Sybr 
Gold perfomied on Bio-Rad (Hercules, CA) Fluor S Multilmager. PGR ampUfication with 
both sets of primers fi:om samples which have the contig of 9-mer oligonucleotides Ugated 
produced a 1 Kb amplicon corresponding to the specific PENTAmer (lanes 1, 3, and 9). The 
control (tube # 3) in which short oUgos are present but no ligase is added does not have the 
ampUcon, indicating that no extension fi*om short oligos occurs in the absence of ligation 
(lanes 5 and 13). The sample which did not have adaptor A tailed oligonucleotide ligated 
(tube# 2) is negative when probed by PGR with the tail primer 9 (lane 11). This vaUdates the 
specificity of the second ligation step. In all controls in which dU containing strands have 
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not been destroyed by uracil glycosylase, non-specific PBNTAmers are amplified indicating 
release of some biotinylated strands by NaOH treatmrat (lanes 2, 4, 6, 10, 12, and 14). 

[0349] This example demonstrates that contigs of short oligonucleotides can be 
successfully assembled and extended at specific kernel positions using E. coli PENTAmer 
library as template. Ligation of universal adaptor A oligonucleotide with unique 5 '-tail and 
destruction of dU containing PENTAmer with uracil glycosylase allows additional level of 
selective specificity. . 

EXAMPLES 

PREPARATION AND ANALYSIS OF PRIMARY PENTAMER LIBRARY 
FROM E. COLISAIBA I PARTIAL GENOMIC DIGEST 

[0350] This Example describes prq)aration of primary PENTAmer library fi-om E, 
coli genonMC DNA using partial digest with frequently cutting enzyme. As shown in the 
following examples, this library can be used for filling gaps and de novo sequencing of 
genomes having the complexity of an average bacterial genome. 

[0351] After performing an experiment to test the efficiency of partial restriction 
digestion, aliquots of 2 \ig of E, coli genondc DNA prepared by standard purification are 
digested in three separate tubes with 4, 2, or 1 unit(s) of i^^z^SA I (New England Biolabs; 
Beverly, MA) for 20 min at 3TC in final volume of 100 ml. Samples are combined and 
DNA fragments are size-fractionated by Reverse Phase Isodimensional Focusing RF-IDF) 
electrophoresis. Combined sample is loaded in preparative lane on 0.55% pulse-field grade 
agarose gel (Bio-Rad; Hercules, CA) along with lKb+ ladder (Life Technologies; Rockville, 
MD). Electrophoresis in the forward direction is performed at 6 V /cm in interrupted mode 
(60 sec on, 5 sec o£E) for 1.5 hours. Section of the gel containing a lane of standards and a 
lane of the DNA sample is excised, stained with Sybr Gold and bands are visualized on Dark 
Reader Blue Light Transillinninator (Clare Chemical Research). Region of the gel containing 
DNA molecules smaller than 2 Kb is cut out and removed. The remaining portion of the 
stained slice is aligned back with the unstained gel and used as a landmark for cuttmg and 
removing of the fraction containing DNA firagments bellow 2 Kb. The unstained gel is then 
run in reverse direction in interrupted field of 6 V/cm (60 sec on, 5 sec ofi) for 85% of the 
forward time. After electrophoresis is complete the gel is stained with Sybr Gold. The band 
of interest now focused in a sharp narrow region is cut out and recovered from the agarose 
using Gel Extraction kit (Qiagen; Valencia, CA) in 10 mM Tris-HCl pH 8.5. 
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[0352] The sample is split into two tubes, supplemented with 1 x SAP buffer 
(Roche; Nutley, NJ), and DNA is dephosphorylated with 15 units of SAP (Roche; Nutley, 
NJ) for 20 min at ST'C. SAP is heat-inactivated for 15 min at 65**C, and DNA is purified by 
extraction with equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) and 
precipitation with ethanol. Digested DNA is dissolved in 100 |al of TE-L buffer. 

[03531 The sample -is mixed with 40 pmoles of pre-assembled BamH I nick- 
translation adaptor (adaptor A3 consisting of primers 11, 12, and 13; Table VI) and ligation is 
carried out overnight at l&'C with 2,800 units of T4 ligase (NEB). To remove ligase and 
excess free adaptor the sample is extracted with equal volume of phenol:chlorofonn:isoamyl 
alcohol (25:24:1), mixed with 1/4 vol of QF buffer (final concentrations of 240 mM NaCl, 
3% isopropanol, and 10 mM Tris-HCl, pH 8.5) in a volume of 400 \il and centrifuged at 200 - 
x g to a volmne of approxnnately 100 on Microcon YM-100. The sample is w;ashed 3 
times with 400 }xl of TE-L buffer at 200 x g and concentrated to a final volume of 135 |xl. 

[0354] The purified sample is subjected to nick-translation with 38 units of wild 
type Tag polymerase in Ix Perkm Ehner (Norwalk, CT) PGR buffer buffer n containing 4 
mM MgCl2 and 200 mM of each dNTP in final volume of 240 fxl for 5 min at 50**C, Reaction 
is stopped by addition of 6 |xl of 0.5 M EDTA pH 8.0 and products are analyzed on 6% TBE- 
urea gel (Novex; San Diego, CA) after staining with Sybr Gold. 

[0355] The sample is supplemented with blocking oligonucleotide complementary 
to the nick-translation template strand adaptor sequence (primer 15) at a final concentration 
of 1 mM, denatured by boiling at lOO'^C for 3 min, and cooled on ice for 5 min. Twelve 
hundred microgran^; of streptavidin coated Dynabeads M-280 (Dynal) are prewashed with 
TE-L buffer and resuspended in 2x BW buJBfer (20 mM Tris-HCl, 2 mM EDTA, 2 M NaCl, 
pH 7.5). Denatured DNA is mixed with equal volimie of beads suspension in 2x BW buffer 
and placed on rotary shaker for 2 hr at room temperature. The beads are bound to magnet 
and washed with 2 x 100 jil each of 1 x BW buffer and TE-L buffer. Non-biotinylated DNA 
is removed by incubating the beads in 100 ml of 0.1 N NaOH for 5 min at room temperature. 
Beads are washed witii 100 of 0.1 N NaOH, neutralized by washing with 5 x 100 |li1 of TE- 
L buffer, and resuspended m 150 \il of TE-L buffer. 

[0356] One half of the prepared library DNA is then processed for ligation with 
adaptor Bl. To nMnimize formation of adaptor A-B dimers on magnetic beads, the 
suspension (75 \xl) is supplemented with Ix T4 ligase buffer (NEB) incubated with 50 pmoles 
of 3 '-blocked oligonucleotides one of which is complementary to the biotinylated adaptor A 
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Strand and has 3 '-extension of 24 bases (primer 20) to which the second oligonucleotide 
(primer 21) is complementary. The suspension is heated for 1 min at 60°C, cooled to room 
temperature and incubated for 10 min at room temperature to anneal the blocking 
oligonucleotides to residual free adaptor A3 molecules bound to magnetic beads. Beads are 
then washed with 50 ixl of Ix T4 ligase buffer and resuspended in 50 \xl of the same buffer. 
Adaptor Bl is then ligated to the library DNA. The sample from the previous step is 
supplemented with 40 pmoles of each adaptor B oligonucleotide (primers 16 and 17) in Ix T4 
ligase buffer and 4000 units of T4 ligase (NEB) in final volume of 55 |iL Ligation is 
performed at room temperature for 3 hours on end-to-end rotary shaker. Beads are bound to 
magnet, washed with 2 x 100 each of 1 x BW buffer and TE-L buffer and nonbiotinylated 
DNA removed by incubating the beads in 100 fil of 0.1 N NaOH for 5 min at room 
temperature. Beads are washed with 100 nl of 0.1 N NaOH, neutralized by washing with 5 x 
100 ml of TE-L buffer, resuspended in 90 ml of SB buffer and stored at 4**C. 

[0357] Representativity of the PENTAmer library from E, coli Sau3A I partial 
genomic digest is analyzed by PGR amplification with 50 random kernel primers and 
imiversal adaptor Bl primer. Kernel primers specific for regions of the E. coli genome 
located approximately 50-250 bp downstream of Sau3A I restriction sites are designed to 
have high internal stability and low frequency of their six 3 '-terminal bases matched against 
E, coli genomic frequency database (Oligo Primer Analysis software. Molecular Biology 
Insights). Magnetic beads containing Ubrary DNA are prewashed with water and 1 ml (1.1 
% of the total library DNA) used as template for PGR amplification with 100 nM of universal 
adaptor B primer (primer 18) and 100 nM of each E, coli kernel primer in a final volume of 
25 ml. After initial denaturing at 95**C for 1 min, 32 cycles are earned out at 94^C for 10 sec 
and 68°C for 75 sec. Five ml aliquots are sqjarated on 1 % agarose gel and visualized on 
Fluor S Multilmager (Bio Rad) after staining with Sybr Gold. FIG. 24 shows the 
amplification patterns obtained with 40 representative kernel primers. The bands of different 
size in each lane correspond to amplified PENTAmers having the kernel sequence at different 
positions relative to the nick-translation tennination sites (lighted adaptor Bl). Although 
PENTAmer molecules are size-fractionated and are all in the range of 1 Kb, the relative 
position of any kernel sequence wiU be shifted in individual PENT molecules originating at 
given iSa«3A I restriction site. Thus the pattern of amplification reflects the frequency of 
SauSA I sites located upstream from each kernel . 
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[0358] This example demonstrates that representative normalized primary 
PENTAmer library can be produced from from PENTAmer library prepared from partial 
SauSA I restriction digest. 

EXAMPLE 6 

GENOME' WALKING SEQUENCING OF 50 SAMPLE SEQUENCES IN K COLI 
USING PRIMARY PENTAMER LIBRARY PREPARED FROM PARTIAL SA IB A I 

RESTRICTION DIGEST 

[0359] This example validates a direct genome walking sequencing strategy for 
gap filling and de novo sequencing of genomes of the complexity ofE. coli from PENTAmer 
library prepared with frequently cutting restriction enzyme. 

[0360] Fifty random oUgonucleotides specific for regions of the E, coli genome 
located approximately 50-250 bp downstream of Sau3A I restriction sites are designed using 
Oligo Primer Analysis software (Molecular Biology Insights). Magnetic beads containing E. 
coli PENTAmer library DNA described in Example 4 are prewashed with water and 1 ml 
(approximately 1.1% of the total library DNA) used as template for PGR amplification with 
100 nM of universal adaptor B primer (primer 18) and 100 nM of each E. coli kernel primer 
in a final volume of 25 |il. After initial denaturing at 95^0 for 1 rnin^ 32 cycles are carried 
out at 94®C for 10 sec and 68**C for 75 sec. Five ml aliquots of 40 representative reactions are 
separated on 1% agarose gel and visualized on Fluor S Multilmager (Bio Rad) after staining 
with Sybr Gold. As shown in Example 5 (FIG. 24) specific patterns of fragments are 
generated for each sequence. 

[0361] PGR amplicons are purified free of polymerase, nucleotides and primers 
by Qiaquick PGR purification kit (Qiagen; Valencia, GA) and are eluted in 30 |il of EB buffer 
(Qiagen (Valencia CA), 100 mM Tris-HGl, pH 8.5). DNA is quantitated by mixing 15 ^1 of 
serial dilutions of the purified samples with equal volume of 1 :200 diluted Pico Green reagent 
(Molecular Probes; Eugene, OR) in TE buffer, incubating at room temperature for 5 min and 
spotting 20 \il aliquots along with standard amounts of DNA (low DNA Mass Ladder, Life 
Technologies; Rockville, MD) on Parafihn (American National Can). DNA is quantitated on 
Bio-Rad (Hercules, CA) Fluor S Multilmager using the volume tool of Quantity One 
software (Bio Rad). 

[0362] Cycle sequencing is performed by mixing 11 jil of DNA samples 
containing 55-80 ng of total DNA with 1 ^1 of 5 mM of each kernel primer used originally in 
PGR (above) and 8 jil of DYEnamic ET teminator reagent mix (Amersham Pharmacia 
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Biotech; Piscataway, NJ) in 96 well plates in final volume of 20 fxL Amplification is 
performed for 30 cycles at: 94^C for 2 sec, 58**C for 15 sec, and 60*'C for 75 sec. Samples are 
precipitated with 70 % ethanol and analyzed on MegaBACE 1000 capillary sequencing 
system (Amersham Pharmacia Biotech; Piscataway, NJ) under the manufacturer's protocol. 

[0363] Alternatively, cycle sequencing is done using the Thermo Sequenase 
Cy5.5 Dye Terminator Cycle Sequencing kit (Amersham Pharmacia Biotech; Piscataway, 
NJ) by mixing 24 |al of template containing 20-50 ng of DNA with 1 |il of 10 mM primer, 1 
\i\ of each mdividual Cy5.5 dye-labeled ddNTP teminator, 3.5 \il of reaction buffer 
concentrate, and 20 units of Thermo Sequenase DNA polymerase in total volume of 31,5 nl. 
After initial denaturing at 94°C for 1 min, amplification is performed for 30 cycles at: 94^C 
for 10 sec, 58*^C for 30 sec, and 72°C for 1 min. Samples are purified by DyeBx dye 
terminator removal kit (Qiagen; Valencia, CA) and analyzed on OpenGene sequencing 
system (Visible Genetics). 

[0364] Table Vm shows a summary of the sequencing results obtained with fifty 
E. coli kemel primers on the MegaBACE 1000 sequence analyzer in a single run. On 
average read lengths of the analyzed sequences are in the order of 500 bases. A sequence is 
considered to be a failure if about 100 or less bases are called. At a preset threshold score of 
>20 using the Phred algorithm (Codon Code Corporation; Dedham, MA) which corresponds 
to an error probability of 1%, twenty two percent of the sequences failed, whereas at a Phred 
value of 1 0 (90% accuracy), the failure rate is 20%. 



Table VM 


Summary of SO E.coU Kemel Sites Sequenced Directly from Primary 


FENTAmer library of Partial SauiX I Restriction Digest 


Sequence ED 


Read length (bases):" 


Read length (bases):*" 


Read length (bases):" 


Phred >20 (99% 


Phred >10 (90% 


Cimarron 1.53 Slim 




accuracy); 


accuracy); failure: 


Phredify /Quality Index 




failure: <100 bases 


<100 bases called 


failure: <100 bases 




called 




called 


SI 








S2 


614 


677 


651 / 95 


S3 


557 


593 


706 / 95 


S4 


failure* 


failure* 


failure* 


S5 


399 


421 


414/96 


S6 


665 


757 


844/91 


S7 


failure* 


failure* 


failure* 


S8 


673 


706 


435/95 


S9 


failure* 


failure* 


failure* 
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Table VED Summaty of 50 E.coli Kernel Sites Sequenced Directly from Primary 


PENTAmer library of Partial Sau3A I Restriction Digest 


Spniipfficp ID 


Read length (bases):" 


Read length (bases) :^ 


Read length (bases) :^ 


Phred >20 (99% 


Phred >10 (90% 


Cimarron 1.53 Slim 




accuracy); 


accuracy); failure: 


Phredify /Quality Index 




failure: <100 bases 


<100 bases called 


failure: <100 bases 




called 




called 


SIO 


383 


423 


453 / 95 


Sll 


569 


605 


618/94 


812 


449 


533 


629 / 92 


813 


494 


533 


627/93 


814 


527 


540 


550/97 


S15 


573 


619 


633 / 96 


816 


111 


129 


549 / 90 


817 


failure* 


failure* 


failure* 


818 


679 


765 


773 / 91 


819 


611 


682 


812 / 93 


820 


676 


741 


906 / Q'? 


821 


609 


628 


6317 96 


822 


683 




1 JO 1 7 § 


823 


failure* 


141 


178/81 

X / o / ox 


824 


533 


584 




825 


670 


711 


780 / 96 


826 


489 


698 


398 / 88 


827 


580 


618 


736 / 94 


828 


628 


663 


689 / 97 


829 


failure* 


failure* 


failurft* 

XCUJ.LU. W 


830 


438 


501 


429 / 93 


831 


failure* 


failure* 


failure* 


832 


565 


620 


574 / 96 


833 


109 


153 


248 / 87 


834 


174 


267 


341 / 86 


835 


210 


314 


301 / 89 


836 


456 


530 


596/91 


837 


607 


636 


729 / 95 


838 


565 


612 


608 / 97 


839 


490 


593 


586 / 94 


840 


failure* 


failure* 


failxire* 


841 


163 


267 


320 / 87 


842 


500 


577 


397 / 93 


843 


573 


610 


618/95 


844 


failure* 


failure* 


415 / 85 


845 


failure* ' 


failure* 


306/84 


846 


failxire* 


failure* 


321 / 86 


847 


480 


543 


553 / 93 


848 


460 


526 


506/92 


849 


498 


554 


713/91 


850 


234 


406 


239/86 
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Table VM Summary of 50 E.coli Kernel Sites Sequemced Directly from Primary 
PENTAmer library of Partial Sau3A I Restriction Digest 


Sequence ID " 


Read length (bases):'' 
Phred>20 (99% 
accuracy); 
failure: <iOO bases 
called 


Read length (bases):*" 
Phred>10(90% 
accuracy); failure: 
<100 bases called 


Read length (bases):" 
Cimarron 1.53 Slim 
Phredify /Quality Index 
faUure: <100 bases 
called 




Failure rate: 22% 

Average read length: 
495 (not including 
failures) 


Failure rate: 20% 

Average read Length 
546 (not including 
failures) 


Failure rate: 14% 

Average read length 554 
(not including failures) 

Average quality index: 92 



^ Specific kemel E. coli primers annealing 1 - 250 bases downstream firom a SauZK I 
sites used in cycle sequencing. 



Number of bases the Phred (Codon Code Corporation, Dedham, MA) algorithm 
considers above the threshold score of 20. 
A Phred score of 20 corresponds to an error probability of 1 %. 

^ Number of bases the Phred (Codon Code Corporation, Dedham, MA) algorithm 

considers above the threshold score of 10. 

A Phred score of 10 corresponds to an error probability of 10%. 

^ Number of bases called by the Cimarron 1.53 Slim Phredify basecaller (Amersham 
Pharmacia Biotech Inc., Piscataway, NJ), 

The Quality Index corresponds to the accuracy rate of the called bases. 
A sequence is considered a failure when less than 100 bases are called. 

[0365] In addition, forty six PCR samples out of the fifty analyzed in Table Vm 
are sequenced using the Thermo Sequenase Cy5.5 Dye Terminator Cycle Sequencing kit 
(Amersham Pharmacia Biotech) as described above and analyzed on OpenGene sequencing 
system (Visible Genetics). Average data fix)m two independent amplification and cycle 
sequencing reactions at threshold score of >20 using the Phred algorithm produced read 
lengths of 291 bases. The failure rate of samples yielding read lengths of less than 100 bases 
in this sequencing protocol at Phred value of >20 is 17%. 

[0366] Combining the results firom the two sets of dkect sequencing e)q)eriments 
from primary PENTAmer library yielded a total of 6 failed samples out of 50, representing a 
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success rate of 88% at a Phred value of >20. This result suggests that almost half of the failed 
samples on any of the two sequencing protocols are random failures. 

[0367] Five of the samples that failed in the first sequencing attempt (FIG. 24, 
samples S7, S9, S23, S29, and S40) are re-sequenced through the Visible Genetics protocol, 
using same primers in PGR amplification but nested sequencing primers. All of them 
produced good sequence data, with an average read length of 234 bases at Phred of > 20. 

[0368] This example demonstrates that an average of 88% of random genomic E. 
coli sequences can be amplified directly firom primary PENTAmer library of partial 
restriction digest with firequently cutting enzyme. Read lengths are on average 250 bases for 
the Visible Genetics instrument and 500 for the MegaBACE instrument respectively, at 
accuracy level of 99%. All of the failed samples that were attempted for re-sequencing by 
using nested primers during cycle sequencing were successful. Due to the length variation in 
the termination positions of PENT products during nick-translation ("fuzzy ends"), the 
concentration of intervening adaptor B sequences originating firom Sau 3 A sites upstream of a 
given kernel is £qpparently diluted to a point where no significant interference occurs and the 
read length and quality of the sequencing reactions are comparable to sequencing uniformly 
•sized PGR fiiagments. However, some sequences containing very short firagments (for 
example, see FIG. 24, lane 21) have reduced concentration of the full length and intermediate 
size amplicons due to PGR bias in favor of the shorter firagment. These are usually kemel 
sequences which happen to fall in the range of 800 bp to 1 Kb downstream of clusters of 
Sau3A I restriction sites. Initiation of PENT synthesis fi*om such clustered Sau3A I sites 
brings the kemel sequence in close proximity of adaptor B resulting in short amplicons. In 
other cases, excessive mis-priming and/or incompatability between kemel and universal 
primers is the probable reason for failure. Whatever the reason for sequencing failures, it 
should be mentioned that no simple correlation between the pattern of PGR Segments on 
FIG. 24 and the failure of sequencing can be established. In cases where amplification of 
only short firagments is the suspected reason for sequencing failure, size firactionation of the 
PGR products followed by reamplification is performed as described in Example 7. 

EXAMPLE? 

GENOME WALKING SEQUENCING IN COLI AFTER SIZE FRACTIONATION 
OF PGR AMPLICONS OBTAINED FROM PRIMARY PENTAMER LIBRARY OF 
PARTIAL SAIBM RESTRICTION DIGEST 
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[0369] This Example shows that samples amplified directly jfrom primary 
PENTAmer library of partial Sau3A I restriction digest can be size-separated and re- 
aiuplified by PCR to eliminate interference of very short fragments on the read length and/or 
the quality of the sequencing data. Selected sequences among the 55 originally studied in 
Example 6 are analyzed by creating a pool of the PCR products from the first amplification 
followed by size fractionation to reduce the bias against large fiagments. 

[0370] After amplification of fifty-five E. coli kernel sequences described in 
Example 5, aliquots of 1 |il of each individual PCR sample are combined and 12 yl subjected 
to Reverse Field Isodimensional Focusing (RF-EDF) electrophoresis as follows: Combined 
sample is run on 1% agarose gel electrophoresis in forward direction at 6 V /cm. Section of 
the gel containing a lane of standards (1 Kb+, Life Technologies; Rockville, MD) and a lane 
of the DNA sample is excised, stained with Sybr Gold and bands are visualized on Dark 
Reader Blue Light Transilluminator (Clare Chemical Research). The region of the gel bellow > 
700 bp is then cut out and removed. The remaining portion of the stained slice is aligned back 
with the unstained gel and used as a landmark for cutting and removing of the firaction 
containing undesired small molecules. The unstained gel is run in reverse direction in at 6 
V/cm for 85% of the forward time. After electrophoresis is complete the gel is stained with 
Sybr Gold. The band PENTAmer molecules now focused in a narrow region is excised and 
eluted at 5,000 x g for 15 min using Ultrafree-DA gel extraction device (Milipore). Sample is 
diluted between 10,000 and 50,000-fold and used as template for re-amplification by PCR 
using individual kemel primers and universal adaptor Bl primer (primer 18). FIG. 25 shows 
an example of two E. coli genomic sequences amplified after size fractionation. Essentially 
all short firagments are eliminated in the second amplifications step. 

[0371] PCR amplified samples are purified by Qiaquick PCR purification kit 
(Qiagen; Valencia, CA), eluted in 30 ml of EB buffer (Qiagen; Valencia, CA) and sequenced 
as described in Example 6. 

[0372] Three failed samples from the first approach are resequenced through the 
Visible Genetics sequencing protocol, using the size-fractionated library as template. One 
sequence had a read leiigth of 259 bases (Phred > 20), a second sequence produced a read 
length of less than 100 bases at Phred value of >20. However, this sample (Table VUI, 
sample 831) was base called by tiie Visible Genetics software and had a contig of 346 bases 
matching 99% the published E. coli database sequence.. The third sequence did not yield 
usefiil sequence data but was among the samples successfiilly sequenced through the 
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MegaBACE protocol directly from the primary library (Table Vin, sample S13). The only 
sample producing ambiguous result in both sequencing attempts (Table VIU, sample S31) not 
only contains a cluster of j5ve Sau3A I restriction sites within 0.8- 1 Kb upstream of the 
kernel but also the 12 bases at its 5' terminus are part of repetitive element in the E. coli 
genome. 

[0373] To test the overall performance of sequencing following size fractionation, 
fomleen additional samples from the size-fractionated pool were analyzed on the MegaBACE 
1000 sequencer. Seven samples had an average read length of 575 bases (Phred >20) and 
seven had red lengths under 100 bases (Phred >20) thus yielding a success rate of only 50 %. 

[0374] In simimary, combining the three approaches for sequencing £. coli 
genomic sequences from primary PENTAmer library of partial Sau3A I restriction digest: (i) 
direct sequencing after PGR from primary library with kemel and universal primer, (ii) nested 
kemel primers during cycle sequencing, and (iii) size-fractionation of pooled PGR amplicons, 
followed by PGR re-amplification, collectively yielded 100% success rate for the 50 E. coli 
sequences analyzed in Example 6 and Example 7 with only one ambiguous sequence. 

EXAMPLES 

PREPARATION AND ANALYSIS OF SECONDARY PENTAMER LIBRARY 
FROM K COLI SACBA I PARTIAL GENOMIC DIGEST 

[0375] This example describes the preparation of secondary library derived from 
the PENTAmer E. coli BamK I library shown in Example 5. The library is prepared by PGR 
amplification of the primary library in the presence of dUTP and biotinylated B adz^tor 
oligonucleotide, capture of the biotinylated strand on magnetic beads and blocking of its 
3 'end by transfer of dideoxy adenosine with terminal transferase. 

[0376] One microliter of primary PENTAmer E, coli Sau3A I Genomic library 
beads (appr. 1 % of the total library) is used as PGR template with biotinylated adaptor Bl 
primer (primer 19) and adaptor A3 PGR primer (primer 14) in the presence of 0.2 mM of 
each dNTP and 0.3 mM dUTP. After 23 cycles at 94T for 10 sec and 68^G for 75 sec, 
eleven reaction tubes of 25 \il are combined. The sample is purified using Qiaquick PGR 
purification kit (Qiagen; Valencia, CA) and eluted in 100 jil of EB buffer (10 mM Tris-HGl, 
pH 8.5. Library DNA is further size-fractionated by RF-DDF electrophoresis. Sample is 
loaded on preparative 0.7 % pulse-field grade agarose gel (Bio Rad) along with lKb+ ladder 
(Life Technologies; Rockville, MD). Electrophoresis in the forward direction is performed at 
6 V /cm in interrupted mode (60 sec on, 5 sec ofiQ for 1.4 hours. A section of the gel 
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contaming a lane of standards and a lane of the DNA sample is excised, stained with Sybr 
Gold and bands are visualized on Dark Reader Blue Light Transilluminator (Clare Chemical 
Research). The DNA size region smaller than 1 Kb is cut out and removed. The remaining 
portion of the stained slice is aligned back with the unstained gel and used as landmark for 
cutting and removkig of the fraction containing molecules below 1 Kb in size. The unstained 
gel is then run in reverse direction in intermpted field of 6 V/cm (60 sec on, 5 sec off) for 1.1 
hour. After electrophoresis is complete, the gel is stained with Sybr Gold. The bands of 
interest focused in sharp narrow region are cut out and recovered from the agarose using Gel 
Extraction kit (Qiagen; Valencia, CA) in 10 mM Tris-HCl pH 8.5. 

[0377] Seven hundred and fifty micrograms of streptavidin coated Dynabeads M- 
280 (Dynal) are prewashed with TE-L buffer and resuspended in 2x BW buffer (20 mM Tris- 
HCl, 2 mM EDTA, 2 M NaCl, pH 7.5). The DNA sample is mixed with equal volume of 
beads suspension in 2x BW buffer and placed on rotary shaker for 1 hr at room temperature. 
The beads are bound to magnet and washed with 3 x 100 ml each of 1 x BW buffer and TE-L 
buflfer. Non-biotinylated DNA is removed by incubating the beads with 100 |li1 of 0.1 N 
NaOH for 5 min at room temperature. Beads are washed with 100 \il of 0.1 N NaOH, 
neutraUzed by washing with 5 x ICQ ml of TE-L buffer, and resuspended in 66 |il of water. 

[0378] To prevent free 3' termini from mispriming during primer extension, 
library beads are supplemented with Ix teraiinal transferase buffer (Roche; Nutley, NJ), 0.25 
mM C0CI2, 0.1 mM ddATP, and 60 units of terminal transferase (NEB) in a final volume of 
100 nl and reaction is carried out at 3TC for 30 min. Beads are washed with 2 x 100 \xl each 
of TE-L buffer 1 x BW buflfer, resuspended in 120 jJ of storage buffer (0.5 M NaCl, 10 mM 
Tris-HCl, 10 mM EDTA, pH 7.5) and stored at 4°C. 

EXAMPLE 9 

MULTIPLEXED LINEAR AMPLIFICATION OF E, COLI GENOMIC KERNEL 
SEQUENCES FROM SECONDARY E- COLI PENTAMER LIBRARY DERIVED 

FROM SAIBA I PARTIAL DIGEST 

(0379) This Example describes the amphfication of three E. coli sequences in 
multiplexed linear amplification cycling reaction from secondary dU-containing Sau3A 1 
PENTAmer library bound to magnetic beads, prepared as described in Example 8. Linear 
amplification is performed in the presence of 3 '-blocked oligonucleotide annealing in the 
region of adaptor B to prevent newly synthesized single stranded molecules firom self- 
priming. The second strand is extended by addmg an excess of unblocked adaptor B primer. 
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After removal of magnetic beads iull-size products are purified by size fiactionation, dU- 
containing molecules are destroyed by treatment with uracil DNA glycosylase and the 
sequences enriched by multiplexed linear amplification are segregated by PGR amplification 
with individual kernel primers and imiversal adaptor Bl primer. 

[0380] Three oligonucleotides specific for E. coli kernel sequences adjacent to 
Sau3A I restriction sites are mixed in 1 x AdvanTaq+ buffer (Clontech; Palo Alto, CA) at 
final concentration of 40 nM each with 100 nM of 3 '-blocked oUgonucleotide (primer 17), 10 
mM each dNTP, 10 ml of secondary dU containing Sau3A I PENTAmer library beads 
(Example 8) and 1 x AdvanTaq+ hot start DNA polymerase in final volume of 60 |il. 
Identical control reaction is assembled which lacks DNA polymerase. After initial denaturing 
at 94^C for 1 min, samples are subjected to 29 cycles at 94°C for 10 sec, and 68**C for 75 sec. 
Adaptor Bl PGR primer (primer 18) is added at final concentration of 330 nM and two more 
cycles are performed at 94**C for 10 sec, and 68**C for 75 sec to fill up second strand. 

[0381] Samples are subjected to electrophoresis on 1% agarose gel, stained with 
Sybr Gold and bands are visualized on Dark Reader Blue Light Transilluminator (Clare 
Chemical Research). The bands of 1 Kb are cut out and eluted at 5,000 x g for 15 min using 
Ultrafiiee-DA gel extraction filter (Millipore; Bedford, MA). After 30-fold dilution in 10 mM 
Tris-HCl, pH 7.5, aliquots of 50 ml are supplemented with one unit of heat labile uracil DNA 
glycosylase (UDG, Roche; Nutley, NJ) and incubated for 45 min at 20**C. UDG is heat- 
inactivated at 95®C for 1 0 min and samples are analyzed by PGR. 

[0382] One microliter of each sample is applied as template for PGR with 200 nM 
of each individual kernel primer used for linear amphfication and 200 nM universal adaptor 
Bl primer (primer 18). In multiplexed mode, a mixture of the three primers at 80 nM each 
and 200 nM of universal adaptor Bl primer (primer 18) are used. PGR samples are analyzed 
on 1% agarose gel after staining with Sybr Gold. FIG. 26 shows the result of this analysis. 
All three sequences are amplified as full-size firagments. The products of the PGR 
amplification are purified by Qiaquick PGR purification (Qiagen; Valencia, CA) eluted in 30 
^1 10 wM Tris-HCl, pH 8.5 and aliquots containing 20-50 ng of DNA are sequenced with 
Thermo Sequenase Cy5.5 Dye Terminator Cycle Sequencing kit (Amersham Pharmacia 
Biotech) on OpenGene sequencing system (Visible Genetics) as described in Example 6 with 
the same kemel prim^ used in linear amplification and PGR. All three sequences are 
confirmed. 
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EXAMPLE 10 

PREPARATION AND ANALYSIS OF PENTAmer LIBRARIES FROM HUMAN 
GENOMIC DNA AFTER COMPLETE BAim. I OR PARTIAL SAViA. I DIGESTION 

[0383] This example describes the preparation of primary human genomic 
PENTAmer libraries bomid to magnetic beads and their amplification with universal adaptor 
primers. 

[0384] Aliquots of 10 micrograms of genomic DNA prepared by standard 
purification from fi:esh human lymphocytes are digested with 140 units of 5amH I (NEB) for 
6 hours at 3TC or with 20 units of SauZK I (New England Biolabs; Beverly, MA) for 35 min 
at 37°C. Twenty ^ig of BamK I or 50 jig of Sauih I digested DNA are treated with 3 
units/mg of SAP (Roche; Nutley, NJ) for 20 min at ZTC, SAP is heat-inactivated for 15 min 
at 65**C and DNA is purified by extraction with equal volume of phenol:chlorofonn:isoamyl 
alcohol (25:24:1) and precipitation with ethanol. DNA fiagments are size-fractionated by 
preparative RF-IDF in 0.75% pulse-field grade agarose (Bio-Rad; Hercules, CA) gel. 
Electrophoresis in forward direction is performed at 6 V /cm in interrupted mode (60 sec on, 
5 sec off) for 2 hours. After cutting the section of the gel containing DNA molecules below 2 
Kb, reverse field of 6 V/cm (60 sec on, 5 sec off) is applied for 1.7 hours. Bands are excised 
and recovered from the agarose by Gel Extraction Kit (Qiagen; Valencia, CA) in 10 mM 
Tris-HClpH8.5. 

[0385] Samples are mixed with 1.2 pmoles {BantR I) or 6 pmoles {Sau3A I) of 
pre-assembled BamR I nick- translation adaptor (adaptor A3 consisting of primers 11, 12, 
and 13) and after heating at 65''C for 1 min Ugation is carried out at 20^C for 2.5 hours with 
4,800 units of NEB T4 Ugase {BainR I) or 11,200 units of NEB T4 ligase (Sau^k I). To 
remove ligase and excess free adaptor the sample is extracted with equal volume of 
phenol:chlorofonn:isoamyl alcohol (25:24:1), mixed with 1/4 vol of QF buffer (240 mM 
NaCl, 3% isopropanol, and 10 mM Tris-HCl, pH 8.5 final concentrations) in a volume of 400 
fil and centrifuged at 200 x g to a volume of 100 jil in Microcon YM-100 filtration units. The 
samples are washed 3 times with 400 ^il of TE-L buffo at 200 x g and concentrated to a final 
volume of 65 jil (BamR J) or 120 ml {Sau3K I). 

[0386] The purified samples are subjected to nick-translation with 1 9 units (BamH 
I) or 38 units (iS'aM3A I) of wild type Tag polymerase in Ix Perkin Ehner (Norwalk, CT) PGR 
buffer buffer n containing 4 mM UgCh and 200 mM of each dNTP in final volume of 120 ^1 
(BamH I) or 240 jil (Sau3A I) for 5 min at 50T. Reactions are stopped by addition of 
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EDTA to a final concentration of 20 mM and products are analyzed on 6% TBE-urea gel 
(Novex; San Diego, CA) after staining with Sybr Gold. 

[0387] Samples are supplemented with blocking oligonucleotide complementary 
to the nick-translation template strand at the region of the adaptor (primer 15) at a fmal 
concentration of 1 mM> denatured by boiling at lOOT for 3 min and cooled on ice for 5 min. 
Eighteen hundred micrograms of streptavidin coated Dynabeads M-280 (Dynal) are 
prewashed with TE-L buffer and resuspended in 2x BW buffer (20 mM Tris-HCl, 2 mM 
EDTA, 2 M NaCl, pH 7.5). Denatured DNA samples are mixed with equal volume of beads 
(1/3 of the total beads with BamE I and 2/3 with Sau3A I sample) in 2x BW buffer and 
placed on rotary shaker for 1 .5 hr at room temperature. The beads are bound to magnet and 
washed 2 x with 100 jil each of 1 x BW buffer and TE-L buffer. Non-biotinylated DNA is 
removed by incubating the beads in 100 ml of 0.1 N NaOH for 5 min at room temperature. 
Beads are washed with 100 |il of 0.1 N NaOH, neutralized by washing with 5 x 100 |il of TE- 
L buffer, and resuspended in TE-L buffer. 

[0388] Library DNA samples are th^ processed for ligation with adaptor B. To 
minimize formation of adaptor A-B dimers on magnetic beads the beads suspensions are 
supplemented with Ix T4 ligase buffer (NEB) and incubated with 50 pmoles of 3 '-blocked 
oligonucleotides (primers 20 and 21) as desoibed in Example 5. The suspensions are heated 
for 1 min at 60**C, cooled to room temperature and incubated for 10 mm at room temperature 
to anneal the blocking oUgonucleotides to residual adaptor A molecules bound to magnetic 
beads. Beads are then washed with 50 \il of Ix T4 Ugase buffer and resuspended in 50 ^1 of 
the same buffer. The samples are supplemented with 40 pmoles (BaniH. I) or 80 pmoles 
(Sau3A I) of each adaptor B 1 oUgonucleotide (primers 16 and 17) in Ix T4 ligase buffer and 
4000 units {BamU. I) or 8000 units (Sau3A I) of T4 Ugase (NEB) in jSnal volume of 100 pi 
(BamR I) or 200 jil (iSaM3A I). Ligation is performed at room temperature for 3.5 hours on 
end-to-end rotary shaker to keep the beads in suspension. Beads are bound to magnet, 
washed with 2 x 100 \il each of 1 x BW buffer and TE-L buffer and nonbiotinylated DNA is 
removed by incubating the beads in 100 \il of 0.1 N NaOH for 5 min at room temperature. 
Beads are washed with 100 pi of 0.1 N NaOH, neutralized by washing with 5 x 100 ^il of TE- 
L buffer, resuspended in 160 fxl (Bam HI) or 280 jxl (Sau3A 1) of SB buffer and stored at 
4^C. 

[0389] FIG. 27 shows amplification of the primary PENTAmer hbraries from 
human genomic DNA prepared by complete BamH 1 or partial Sau3Al digestion. Magnetic 
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beads containing library DNA are prewashed in water and 0.5 pi of each library used as 
template for PGR amplification with 100 nM of universal adaptor A3 and adaptor Bl primers 
(primers 13 and 18) in final volume of 25 pi. After initial denaturing the indicated nxmiber of 
cycles are carried out at 94°C for 10 sec and eS^'C for 75 sec. Ten jil aliquots are separated on 
1 % agarose gel and visualized on Fluor S Multilmager (Bio Rad) after staining with Sybr 
Gold. 

[0390] This example demonstrates that primary PLEX-imer Ubraries can be 
prqpared and amphfied from eukaryotic genomic DNA. 

EXAMPLE 11 

PREPARATION AND ANALYSIS OF SINGLE-STRANDED CIRCULAR PENTAmer 
LIBRARIES FROM FROM HUMAN GENOMIC DNA AFTER COMPLETE BA3m I 

OR PARTIAL SAiaA I DIGESTION 

[0391] This example describes the preparation of circular single-stranded 
derivatives of primary human genomic Sau3A I and BamK I Ubraries described in Example 
10. These circular libraries are used as template for reverse PCR amplification with kemel 
hxmian sequences keeping intact the adaptor tags which will allow simultaneous analysis of 
single nucleotide polymorphic (SNP) regions in multiple individuals. 

[0392] Magnetic beads containing primary human BamB I or SaziSA I library 
DNA (Example 10) are pre-washed in water and 0.5 ^1 of each library is used as template for 
PCR amplification in 16 individual tubes for each library with 200 nM of 5'-biotinylated 
adaptor Bl primer (primer 19) and 5'-phosphorylated adaptor A3 primer (primer 23) in final 
volume of 50 ml. Afl;er initial denaturing at 95**C, eighteen cycles of PCR are performed at 
94^C for 10 sec and 6S^C for 75 sec. Beads are removed on magnet and the individual PCR 
samples for each Ubrary are pooled. 

[0393] Samples are purified free of primers and Tag polymerase on Qiaquick 
PCR purification filters (Qiagen; Valencia, CA) and eluted in 150 |xl of 10 mM Tris-HCl, pH 
8.5. DNA is poUshed with 4 units of T4 DNA Polymerase (Roche; Nutley, NJ) in the 
presence of 200 nM of each dNTP for 30 min at 25'C. DNA samples are purified on 
Qiaquick PCR purification filters (Qiagen; Valencia, CA), supplemented with 1/4 volume of 
QF buffer (240 mM NaCl, 3% isopropanol, and 10 mM Tris-HCl, pH 8.5 final 
concentrations) in a volume of 400 pi, and centrifiiged at 200 x g to a volume of 100 fxl in 
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Microcon YM-100 filtration units. The samples are washed 3 times with 400 \il of TE-L 
buflfer at 200 x g and concentrated to a final volume of 130 )xl. 

[0394] Sixteen hundred micrograms of streptavidin-coated Dynabeads M-280 
(Dynal) are prewashed with TE-L buffer and resuspended in 2x BW buffer (20 mM Tris-HCl, 
2 mM EDTA, 2 M NaCi, pH 7.5). Denatured DNA samples are mixed with equal volume of 
beads in 2x BW buffer and placed on rotary shaker for 1 hr at room temperature. The beads 
are bound to magnet and washed 2 x with 100 ml each of 1 x BW buffer and TE-L buffer. 
Beads are resuspended in 100 |il of SB buffer and stored at 4**C. 

[03951 One half of the Sau3A I library DNA is incubated with 20 ^il of 0.1 N 
NaOH for 5 min at room temperature. Eluted non-biotinylated DNA strands are neutralized 
with 10 ml of 0.2 N HCl and 3 jil of 1 M Tris-HCl, pH 8.0. Sample is diluted to 100 id with 
water and any residual biotin-containing DNA is removed by incubation with 200 \xg of firesh 
streptavidin beads for 30 min at room temperature. Single-stranded DNA is purified on 
Qiaquick PCR purification filters (Qiagen; Valencia, CA) and eluted in 60 |xl of 10 mM Tris- 
HCl, pH 8.5. 

[0396] Sau3A 1 library single-stranded DNA is incubated with 3'-C7 amino 
blocked bridging oligonucleotide (primer 24) bringing together adaptor A3 (5' terminus) and 
adaptor Bl (3 '-terminus) to form circular molecules by Ugation. DNA is aliquoted into four 
200 ng samples and incubated with bridging oligonucleotide (primer 24) at 0, 15, 75, or 150 
nM final concentration in 1 x Tsc ligase buffer (Roche; Nutley, NJ) and final volume of 30 
liL After initial denaturing at 95^C for 1 min, ligation is performed for 24 cycles at 94°C for 
20 sec and 65**C with 5 units of Tsc DNA ligase (Roche; Nutley, NJ). 

[0397] Samples are split into two aliquots of 15 p.1 and one half is treated with 0.7 
units of T4 DNA polymerase (Roche; Nutley^ NJ) for 1 hr at 3TC in tiie absence of dNTPs to 
destroy linear DNA molecules. The remaining half is left untreated. Aliquots of each treated 
and untreated sample are analyzed on 6% TBE urea acrylamide gel (Novex; San Diego, CA) 
after staining with Sybr Gold (Molecular Probes; Eugene, OR). FIG. 28 shows the result of 
this analysis. In the samples receiving bridging oligonucleotide, a low mobility band appears 
corresponding to circularized PENTAmer molecules. Close to 50% of the single-stranded 
DNA is converted to circular form in the samples having high concentration of bridging 
oligonucleotide. A faint band witii intermediate mobility also appears in the samples ligated 
in the presence of bridging ohgonucleotide, presumably corresponding to linear concatamers. 
Unlike the circular form, both Unear species as well as the bridging oligonucleotide are 
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sensitive to T4 3'-exonuclease activity since considerable reduction in the intensity of these 
bands occurs, after T4 DNA polymerase treatment (compare lanes 5, 6, 7, and 8 with 1, 2, 3, 
and 4). I 
[0398] To test the efficiency of amplification firom human circular SauSA I library I 
the remainder of the samples analyzed on FIG. 28 are purified by ethanol precipitation and ! 
dissolved in 20 |j,l of TE buffer. One microliter aliquots of 10-fold or 500-fold dilutions of 
the samples ligated in the presence of 75 nM bridging oligonucleotide are then used as 
template for amplification in 30 cycles of PCR. Primers annealing at adaptor A3 which will 
amplify only circular DNA molecules (primers 25 and 26) or primers which anneal at adaptor 
A3 and adaptor Bl and will amplify both circular and linear molecules (primers 18 and 26) 
are used. FIG. 29A shows that the amount of circular DNA molecules before treatment with | 
the exonuclease activity of T4 polymerase is hi^er than the amount of circular and linear | 
DNA after such treatment combined (compare lanes 2 and 4). This result independently 
validates the formation of circular single-stranded Ubrary molecules. FIG. 29 B shows an 
attempt for amplification of kemel human sequence in circular mode with a pair of primers 
specific for exon 10 of the human 1p53 gene. The same template as in the e?q)eriment on 
FIG. 29A but without dilution was used before or after treatment with exonuclease in 35 
cycles of PCR amplification. The products of such amplification would be expected to have 
relatively imiform size distributed around the average length of termination of nick- 
translation of PENT molecules in the parental primary library. However, amplicons of 
multiple discrete lengths varying firom 200 bp to 1 Kb are amphfied, indicating more 
complex events compared to kemel ampUfication firom linear library in nested mode 
(Example 12). 

I 

EXAMPLE 12 

AMPLIFICATION OF HUMAN GENOMIC KERNEL SEQUENCES FROM 
PRIMARY PENTAMER LIBRARIES OF COMPLETE BAMH I OR PARTIAL 
SAIQA I DIGESTS BY NESTED PCR 

[0399] This example shows amplification of genomic kemel sequences firom 
primary human BainHL and Sau3A I libraries by nested PCR. In the first PCR reaction limited 
number of cycles are performed using the distal adaptor Bl primer (primer 18) and a kemel 
specific primer up to 500 bp downstream of BamE I or Sau3A I restriction sites. Following 
purification of the ampUcons second PCR is performed with the proximal adaptor Bl primer 
(primer 22) and nested kemel primers. 
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[0400] One microliter of library beads of BamK I or Sau3A I primary hmnan 
libraries prepared as described in Example 10 are used as template for PGR amplification 
with 50 nM distal adaptor Bl primer (primer 18) and 200 nM kemel primer specific for exon 
10 of the human tp53 gene in two aliquots of 25 ml each . After initial denaturing at 94^C for 
1 mia samples are subjected to 12 cycles at 94^C for 10 sec and eS'^C for 75 sec. The two 
aliquots are combined and DNA samples are purified through Qiaquick PGR purification kit 
(Qiagen; Valencia, GA) and eluted in 50 |li1 of EB buffer (10 mM Tris-HCl, pH 8.5). One 
microliter aliquots of the purified DNA samples from the first amplification are used as 
templates in second PGR with 50 nM proximal Bl adaptor primer (primer 22) and 200 nM 
nested kemel primer specific for exon 10 of the human tp53 gene which anneals 45 bp 
downstream of the kemel primer xised in the first PGR amplification. After initial denaturing 
at 94'G for 1 min, samples are subjected to 33 cycles at 94*'G for 10 sec, and SS'^C for 75 sec 
and 10 \il aliquots are analyzed on 1% agarose gel after staining with Sybr Gold (FIG. 30 A). 
Multiple discrete bands are amplified from primary library of Sau3A I partial digest and a 
single band of approximately 500 bp from the library of BamK I complete digest 
respectively. In addition, a second nested kemel primer annealing 83 bp downstream of the 
primer in the first PGR is used with BamK I template under the conditions for nested 
amplification described above. Gomparison of the two nested kemel primers for BamK I 
template (FIG. 30 B) shows that, as expected, single amplicons differing by approximately 50 
bp are produced. The PGR product of nested primer 1 (FIG. 30 B; lane 1) is purified by 
Qiaquick PGR purification 'kit (Qiagen; Valencia, GA) and used as template for sequencing 
with both nested primers, 1 and 2 with DYEnamic ET terminator reagent mix (Amersham 
Pharmacia Biotech) and analyzed on MegaBAGE 1000 capillary sequencing system 
(Amersham Pharmacia Biotech) as described ia Example 6. 

[0401] Additional sequences are amplified by PGR with adaptor Bl universal 
primers (primers 18 and 22) and the following pairs of nested primers: one specific for 
PENTAmer covering exons 2 and 3 of the human tp53 gene using BamKL Ubrary as template, 
and two covering exons 4 and 5, and 6, 7, and 8 respectively, using Sau3A I library as 
template (FIG. 31). Primary and secondary (nested) PGR rounds are carried out as described 
above. In the cases where multiple fragments are obtained (iSIflM3 A I) the bands are excised 
from the agarose gel, extracted with Ultrafree DA gel extraction kit (Millipore; Bedford, MA) 
and appropriate dilutions are used as templates for re-amplification la individual PGR 
reactions with the same primers used in secondary PGR. The amplification products are 
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purified with Qiaquick PGR purification kit (Qiagen; Valencia, CA) and sequenced as above 
with the corresponding nested primers used in PGR. 

[0402] An average read length of 509 bases is achieved with the four human tp53 
samples sequenced at a quality index of 94 (accuracy of 94%) using the Cimmaron 1.53 Slim 
Phrediiy Basecaller algorithm (Amersham Pharmacia Biotech). 

[0403] This example demonstrates that kemel genomic sequences can be 
amplified after nested PGR from primary genomic himian PENTAmer Ubraries prepared by 
complete or partial restriction digestion. 
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[0405] All of the conq)ositions and/or metiiods disclosed and claimed herein can 
be made and executed without undue experimentation in light of the present disclosure. 
While the compositions and methods of this invention have been desaibed in terms of 
preferred embodiments, it will be apparent to those of skill in the art that variations may be 
applied to the compositions and methods and in the steps or in the sequence of steps of the 
methods described herein without departing from the concept, spirit and scope of tiie 
invention. More specifically, it will be apparent that certain agents which are both 
chemically and physiologically related may be substituted for the agents described herein 
while the same or similar results would be achieved. All such similar substitutes and 
modifications apparent to those skilled in the art are deemed to be within the spirit, scope and 
concept of the invention as defined by the appended claims. 
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We claim: 

1. A method of producing a consecutive overlapping series of nucleic acid 
sequences from a DNA sample, comprising the steps of: 

(a) generating a j&rst amplifiable nick translation product, wherein said 
nick translation of said first amplifiable nick translation product initiates from a known 
nucleic acid sequence in the DNA sample; 

(b) determining at least a partial sequence from said first nick translation 

product; and 

(c) generating at least a second amplifiable nick translation product, 
wherein said nick translation of said second ampUfiable nick translation product initiates 
from the partial sequence of said first nick translation product. 

2. A method of producing a hbrary of consecutive overlapping series of nucleic 
acid sequences from a DNA sample comprising DNA molecules having a region comprisiag 
a known nucleic acid sequence, the method comprising the steps of: 

(a) digesting DNA molecules of the DNA sample with a first sequence- 
specific endonuclease to generate a plurahty of DNA fragments; 

(b) g^erating a first amphfiable nick translation product, wherein said 
nick translation of said first ampUfiable nick translation product initiates from ttie known 
nucleic acid sequence; 

(c) detennining at least a partial sequence &om said first nick translation 

product; and 

(d) generating one or more additional amplifiable nick translation 
products, wherein said nick translation of said one or more amplifiable nick translation 
products initiates from the partial sequence of a previous nick translation product 

3. The method of claim 2, wherein said method fiurther comprises the step of 
digesting DNA molecules with at least a second sequence-specific endonuclease, wherein the 
preceding overlapping nick translation product is generated from a DNA fragment from 
digestion with the first sequence-specific endonuclease or from digestion with the second 
sequence-specific endonuclease. 

4. A method of producing a hbrary of consecutive overlapping series of nucleic 
acid sequences, comprising the steps of: 

(a) obtakdng a DNA sample comprising DNA molecules having a region 
comprising a known nucleic acid sequence; 
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(b) partially cleaving the DNA molecules with a sequence-specific 
endonuclease to generate a plurality of DNA ends; 

(c) separating the cleaved DNA molecules; 

(d) generating a first axnplifiable nick translation product, wherein said 
nick translation of said first amplifiable nick translation product initiates firom a known 
nucleic acid sequence; 

(e) determining at least a partial sequence firom said first nick translation 

product; and 

(f) generating one or more amplifiable nick translation products, wherein 
said nick translation of said one or more amplifiable nick translation products initiates firom 
the partial sequence of a previous nick translation product. ,i 

5. The method of claim 4, wherein the separation of the cleaved DNA molecules 
is according to size. 

6. The method of claim 5, whereui tiie size separation is by gel size firactionation. 

7. The method of claim 4, wherein the nick translation products are amplified. 

8. The method of claim 7, wherein the amplification of the nick translation 
product comprises polymerase chain reaction utilizing a first primer specific to a known 
sequence in the nick translation product and a second primer specific to an adaptor sequence 
of the nick translation product 

9. The method of claim 7, wherein at least one of the nick translation products is 
selectively amplified firom the plurality of nick translation products. 

10. The method of claim 7, wherein the nick translation product is single stranded. 

11. The method of claim 4, wherein the partial cleavage of the DNA molecules 
comprises cleaving for a selected time with a firequentiy cutting sequence-specific 
endonuclease, wherein the sequence-specificity of the endonuclease is to three or four 
nucleotide bases. 

12. The method of claim 4, wherein the partial cleavage of the DNA molecules 
comprises subjecting the DNA molecules to a methylase prior to subjection to a methylation- 
sensitive sequence-specific endonuclease. 

13. The method of claim 9, wherein the selective amplification comprises: 

(a) introducing to said plurality of nick translation products a plurality of 
primers, wherein the primers comprise: 

(1) nucleotide base sequence complementary to an adaptor 
sequence in the nick translation product; 
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(2) an additional variable 3 ' tenninal nucleotide; and 

(3) a label; 

(b) hybridizing the primers to their complementary nucleic acid sequences 
in the adaptor to form a mixture of primer/nick translate molecule hybrids; and 

(c) extending from a primer having the 3' terminal nucleotide 
complementary to the nucleotide in the nick translate molecule immediately adjacent to the 
adaptor sequence, wherein the hybridizing and extending steps form a mixture of unextended 
primer/nick translate molecule hybrids and extended primer molecule/nick translate molecule 
hybrids. 

14. The method of claim 13, wherein the method further comprises: 

(a) binding of the mixture by the label to a support; 

(b) washing the support-boimd mixture to remove the nick translate 
molecules; and 

(c) removing the support-bound extended molecule from the support. 

1 5 . The method of claim 1 3 , the primer ftirflier comprises two or more variable 3 ' 
tenninal nucleotides. 

16. The method of claim 9, wherein the method further comprises separating the 
nick translate molecules by size. 

17. The method of claim 16, wherein the size separation is by gel fractionation. 

18. The method of claim 16, wherein the method further comprises a step of 
subjecting the size-separated nick translate molecules to an additional amplification step. 

19. The method of claim 9, wherein the selective amplification step is by 
suppression PGR. 

20. The method of claim 19, wherein the suppression PGR utilizes a primer 
comprising: 

(a) a nucleic acid sequence for a primer specific for an adaptor sequence 
of the nick translate molecule; and 

(b) nucleic acid sequence complementary to a region in a plurality of nick 
translate molecules, whereby the nucleic acid sequence is 5' to the sequence for a primer 
specific for an adaptor sequence of the nick translate molecule. 

21. The method of claim 9, wherein the at least one nick translate molecule is 
amplified by primer extension/Iigation reactions. 

22. The method of claim 21, wherein the method further comprises 
immobilization of the nick translation molecules onto a solid support. 
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23. The method of claim 22, wherein the solid support is a magnetic bead. 

24. The method of claim 21, wherein the primer extension/ligation reactions . 
comprise: 

(a) initiating and extending the primer extension reaction with a first 
primer which is complementary to sequence in a subset of the plurality of nick translate 
molecules, wherein the complementary sequence of the nick translate molecule is adjacent to 
a first adaptor end of the nick translate molecule; and 

(b) ligating an ohgonucleotide to the 5' end of the extension product, 
wherein the oligonucleotide comprises sequence complementary to the first adaptor of the 
nick translate molecule and also comprises a sequence for binding by a second primer, 
wherein the second primer binding sequence in the oligonucleotide is 5' to the first adaptor 
complementary sequence in the oligonucleotide. 

25. The method of claim 24, wherein the method further comprises amplifying the 
primer extended molecule. 

26. The method of claim 25, wherein die method further comprises separating the 
primer extended molecule fi:om the plurality of nick translate molecule. 

27. The method of claim 26, wherein the nick translate molecules were generated 
in the presence of dU nucleotides, the primer extended molecule contains no dU nucleotides, 
and wherein the separating step comprises degradation of the plurality of nick translate 
molecules by dU-glycosylase. 

28. The method of claim 25, wherein the amplification step comprises poljnnerase 
chain reaction using the second primer and a primer complementary to a second adaptor of 
the nick translate molecule. 

29. The method of claim 21, wherein the ligation/primer extension reactions 
comprise: 

(a) ligating in a head-to-tail orientation a pluraUty of oligonucleotides to 
form an oligonucleotide assembly, wherein the oligonucleotides are complementary to nick 
translate molecule sequence adjacent to a first adaptor end of the nick translate molecule and 
wherein the nick translate molecule sequence is present in a subset of the plurality of nick 
translate molecules, wherein the nick translation molecule has the first adaptor on one 
terminal end and a second adaptor on the other terminal end; 

(b) initiating and extending the primer extension reaction with the 3' end 
of the oligonucleotide assembly; and 
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(c) ligating an oligonucleotide to the 5' end of the extension product, 
wherein the oligonucleotide comprises sequence complementaiy to the first adaptor of the 
nick translate molecule and also comprises sequence for binding by a first primer, wherein 
the first primer binding sequence is 5' to the first adaptor complementary sequence in the 
oligonucleotide. 

30. The method of claim 29, wherein the method further comprises the steps of: 

(a) separating the primer extended molecule firom the plurality of nick 
translate molecules; and 

(b) amplifying the primer extended molecule. 

31. The method of claim 30, wherein the nick translate molecules were generated 
in the presence of dU nucleotides, the primer extended molecule contains no dU nucleotides, 
and wherein the separating step comprises degradation of the plurality of nick translate 
molecules by dU-glycosylase. 

32. The method of claim 30, whererin the amplification step comprises 
polymerase chain reaction using the first primer and a second primer complementary to the 
second adaptor of the nick translate molecule. 

33. The method of claim 21, wherein the primer extension/ligation reaction 
comprises: 

(a) initiating and extending the primer extension reaction with a first 
primer which is complementary to sequence in a subset of the plurality of nick translate 

i 

molecules, wherein the nick translate molecule sequence is adjacent to a first adaptor end of 
the nick translate molecule; and 

(b) ligating an oligonucleotide to the 5' end of the extension product, 
wherein the oligonucleotide comprises: 

(1) sequence complementary to the first adaptor of the nick 
translate molecule; 

(2) sequence for binding by a second primer, wherein the second 
primer binding sequence is 5 ' to the sequence in (1); and 

(3) alabelattheS'end. 

34. The method of claim 33, wherein the method further comprises the steps of: 

(a) separating the primer extended molecule firom the plurality of nick 
translate molecules by the label of the oligonucleotide; and 

(b) amplifying the prim^ extended molecule. 

35. The method of claim 33, wherein the label is biotin. 
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36. The method of claim 35, wherein the separation further comprises 
streptavidin-coated magnetic beads. 

37. The method of claim 34, wherein the amplification step comprises polymerase 
chain reaction using the second primer and a third primer complementary to a second adaptor 
of the nick translate molecule. 

38. A method of sequencing nucleic acid, comprising the steps of: 

(a) obtaining a DNA sample comprising DNA molecules haviag a region 
comprising a known nucleic acid sequence; 

(b) partially cleaving the DNA molecules with a seqnence-q)ecific 
endonuclease to generate a pluraUty of DNA ends; 

(c) separating the cleaved DNA molecules; 

(d) geaerating a first ampUfiable nick translation product, wherein the first 
amplifiable nick translation product comprises an adaptor at each end, wherein the nick 
translation of said first amplifiable nick translation product initiates fi-om a known nucleic 
acid sequence; 

(e) determining at least a partial sequence firom said first nick translation 

product; 

(f) generating one or more additional amplifiable nick translation 
products, wherein said nick translation of said one or more additional ampUfiable nick 
translation products initiates fi:om the partial sequence of a previous nick translation product; 
and 

(g) sequencing the nick translation products, wherein the amplified nick 
translation product is not subjected to cloning prior to the sequencing reaction. 

39. The method of claim 38, wherein the DNA sample is a genome. 

40. The method of claim 38, wherein there is a limited amount of DNA sample. 

41. The method of claim 38, wherein the amplification is by polymerase chain 
reaction, and one of the primers for the polymerase chain reaction is used as a primer for the 
sequencing reaction. 

42. The method of claim 38, wherein at least a portion of the adaptor sequence is 
ranoved from the ampUfied nick translation molecule. 

43. The method of claim 42, wherein the removal step comprises subjecting the 
ampUfied nick translation molecule to a S ' exonuclease. 
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44. The metiiod of claim 42, wherein a region of the adaptor sequence of the nick 
translate molecule comprises a dU nucleotide and the removal comprises degradation by dU- 
glycosylase. 

45. The method of claim 39, wherein a region of the adaptor sequence comprises a 
ribonucleotide and the removal comprises degradation by alkaline hydrolysis. 

46. The method of claim 44 or 45, the region of the second adaptor sequence is in 
a 3 ' region of the second adaptor sequence. 

47. A method of providing sequence for a gap in a genome sequence, comprising 
the steps of: 

(a) obtaining a DNA sample of the genome comprising DNA molecules 
having a region comprising a known nucleic acid sequence adjacent to the gap; 

(b) digesting the DNA molecules with a plurality of sequence-specific 
endonucleases to generate a plurality of DNA ends; 

(c) generating a first amplifiable nick translation product, wherein said 
nick translation of said first amplifiable nick translation product initiates fi'om the known 
nucleic acid sequence; 

(d) determining at least a partial sequence firom said first nick translation 

product; and 

(e) generating one or more additional amplifiable nick translation 
products, wherein said nick translation of said one or more amplifiable nick translation 
products initiates firom the partial sequence of a previous nick translation product, wherein at 
least one of the amplifiable nick translation products comprises sequence of the gap. 

48. The method of claim 47, wherein the genome is a bacterial genome. 

49. The method of claim 47, wherein the genome is a plant genome. 

50. The method of claim 47, wherein the genome is an animal genome. 

5 1 . The method of claim 50, wherein the animal genome is a hinnan genome. 

52. The method of claim 48, wherein the bacteria are imculturable. 

53. The method of claim 48, wherein the bacteria is present in a plurality of 
bacteria. 

54. A method of producing a library of consecutive overlapping series of nucleic 
acid sequences firom a DNA sample, comprising the steps of: 

(a) obtaining the DNA sample comprising a DNA molecule; 
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(b) digesting the DNA molecule with a first sequence-specific 
endonuclease to graerate a plurality of DNA firagments, wherein at least one DNA fragment 
has a region comprising a known nucleic acid sequence; 

(c) attaching a first adaptor molecule to ends of the DNA fragments to 
provide a nick translation initiation site, wherein the first adaptor comprises a label; 

(d) subjecting the first adqjtor-bound DNA fragment to nick translation 
comprising DNA polymerization .and 5'-3' exonuclease activity, wherein the nick translation 
initiates from the known nucleic acid sequence, to generate a first nick translation product; 

(e) isolating the nick translation product by the label; 

(f) attaching a second adaptor molecule to the first nick translate product; 

(g) determining at least a partial sequence from the first nick translation 

product; and 

(h) generating one or more additional amplifiable nick translation 
products, wherein said nick translation of said one or more amplifiable nick tran^slation 
products initiates &om the partial sequence of a previous nick translation product. 

55. The method of claim 54, wherein the label is biotin and the isolation step is 
binding to streptavidin-coated magnetic beads. 

56. A method of producing a Hbrary of consecutive overlapping series of nucleic 
acid sequences, comprising the steps of: 

(a) obtaining a DNA sanq)le comprising DNA molecules having a region 
comprising a known nucleic acid sequence; 

(b) partially cleaving the DNA molecules with a sequence-specific 
endonuclease to generate a plurality of DNA fragments, wherein at least one DNA fragment 
has a region comprising a knoAvn nucleic acid sequence; 

(c) separating the cleaved DNA fiagments; 

(d) attaching a first adaptor molecule to ends of the DNA fragments to 
provide a nick translation initiation site, wherein the first adaptor comprises a label; 

(e) subjecting the first adaptor-bound DNA fi:agment to nick translation 
comprising DNA polymerization and 5'-3' exonuclease activity, wherein the nick translation 
initiates &om the known nucleic acid sequence, to generate a first nick translation product; 

(f) isolating the nick translation product by the label; 

(g) attaching a second adaptor molecule to the first nick translate products; 
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(h) detennining at least a partial sequence &om said first nick translation 

product; and 

(i) generating one or more additional amplifiiable nick translation 
products, wherein said nick translation of said one or more amplifiable nick translation 
products initiates from the partial sequence of said first nick translation product. 

57. The method of claim 55, wherem the separation of the DNA fragments is by 

size. 

58. The method of claim 57, wherein the size separation is by electrophoresis. 

59. A Ubrary of consecutive overlapping series of nucleic acid sequences from a 
DNA sample, wherein the Ubrary is generated by the method of claim 2, 4, 54, or 57. 
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